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Preface 



Since 1995, when the SPIN workshop series was instigated, SPIN workshops 
have been held on an annual basis in Montreal (1995), New Brunswick (1996), 
Enschede (1997), Paris (1998), Trento (1999), Toulouse (1999), Stanford (2000), 
Toronto (2001), Grenoble (2002) and Portland (2003). All but the first SPIN 
workshop were organized as satellite events of larger conferences, in particular 
of CAV (1996), TACAS (1997), FORTE/PSTV (1998), FLOC (1999), the World 
Congress on Formal Methods (1999), FMOODS (2000), ICSE (2001, 2003) and 
ETAPS (2002). This year again, SPIN was held as a satellite event of ETAPS 
2004. The co-location of SPIN workshops with conferences has proven to be 
very successful and has helped to disseminate SPIN model checking technology 
to wider audiences. Since 1999, the proceedings of the SPIN workshops have 
appeared in Springer- Verlag’s Lecture Notes in Computer Science series. 

The history of successful SPIN workshops is evidence for the maturing of 
model checking technology, not only in the hardware domain, but increasingly 
also in the software area. While in earlier years algorithms and tool development 
around the SPIN model checker were the focus of this workshop series, for several 
years now the scope has been widened to include more general approaches to 
software model checking techniques and tools as well as applications. 

The SPIN workshop has become a forum for all practitioners and researchers 
interested in model checking based techniques for the validation and analysis 
of communication protocols and software systems. Techniques based on expli- 
cit representations of state spaces, as implemented for example in the SPIN 
model checker or other tools, or techniques based on combinations of explicit 
representations with symbolic representations, are the focus of this workshop. 
It has proven to be particularly suitable for analyzing concurrent asynchronous 
systems. The workshop topics include theoretical and algorithmic foundations 
and tools, model derivation from code and code derivation from models, tech- 
niques for dealing with large and infinite state spaces, timing and applications. 
The workshop aims to encourage interactions and exchanges of ideas with all 
related areas in software engineering. 

Papers went through a rigorous reviewing process. Each submitted paper was 
reviewed by three program committee members. Of 48 submissions, 19 research 
papers and 3 tool presentations were selected. Papers for which one of the editors 
was a co-author were handled by a sub-committee chaired by Gerard Holzmann. 

In addition to the refereed papers, four invited talks were given; of these 
three were ETAPS invited speakers: Antti Valmari (Tampere, Finland) on the 
Rubik’s Cube and what it can tell us about data structures, information theory 
and randomization, Mary-Lou Sofia (Pittsburgh, USA) on the foundations of 
code optimization, and Robin Milner (Cambridge, UK) on the grand challenge 
of building a theory for global ubiquitous computing. Finally, the SPIN invited 
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speaker Reinhard Wilhelm (Saarbriicken, Germany) gave a talk on the analysis 
of timing models by means of abstract interpretation. 

This year we took up an initiative started in 2002 and solicited tutorials that 
provided opportunities to get detailed insights into some validation tools and 
the methodologies of their use. Out of 3 submissions, the program committee 
selected 2 tutorials. 

— An “advanced SPIN tutorial” giving an overview of recent extensions of the 
SPIN model checker as well as some methodological advice for its use. It 
was mainly addressed to users who want to use SPIN as a modelling and 
validation environment. 

— A tutorial on the IF validation environment providing an overview of the IF 
modelling language and the main functionalities of the validation toolbox. 
It was addressed to users who want to use IF as a validation environment by 
feeding it with models in the IF language, or in SDL or UML, but also to tool 
developers who would like to interface their tools with the IF environment. 

Acknowledgements. The volume editors wish to thank all members of the 
program committee as well as the external reviewers for their tremendous effort 
that led to the selection of this year’s program. We furthermore wish to thank 
the organizers of ETAPS 2004 for inviting us to hold SPIN 2004 as a satellite 
event and for their support and flexibility in accommodating the particular needs 
of the SPIN workshop. We wish to thank in particular Fernando Orejas and 
Jordi Cortadella. Finally, we wish to thank Springer- Verlag for providing us 
with the possibility to use a conference online service free of charge, and the 
METAFrame team, in particular Martin Karusseit, for their very valuable and 
reactive support. 
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Formal Analysis of Processor Timing Models 



Reinhard Wilhelm* 
Informatik 

Universitat des Saarlandes 
Saarbrlicken 



Abstract. Hard real-time systems need methods to determine upper 
bounds for their execution times, usually called worst-case execution 
times. This talk gives an introduction into state-of-art Timing- Analysis 
methods. These use Abstract Interpretation to predict the system’s be- 
havior on the underlying processor’s components and Integer Linear Pro- 
gramming to determine a worst-case path through the program. The 
abstract interpretation is based on an abstract processor model that is 
conservative with respect to the timing behavior of the concrete proces- 
sor. Ongoing work is reported to analyze abstract processor models for 
properties that have a strong influence on the expected precision of tim- 
ing prediction and also on the architecture of the timing-analysis tool. 
Some of the properties we are interested in can be model checked. 



1 WCET Determination 

Hard real-time systems need methods to determine upper bounds for their exe- 
cution times, usually called worst-case execution times, (WCET). Based on these 
bounds, a schedulability analysis can check whether the underlying hardware is 
fast enough to execute the system’s task such that they all finish before their 
deadlines. This problem is nontrivial because performance-enhancing architec- 
tural features such as caches, pipelines, and branch prediction introduce “local 
non-determinism” into the processor behavior; local inspection of the program 
can not determine what the contribution of an instruction to the program’s over- 
all execution time is. The execution history determines whether the instruction’s 
memory accesses hit or miss the cache, whether the pipeline units needed by the 
instruction are occupied or not, and whether branch prediction is correct or not. 



2 Tool Architecture 

State-of-art Timing- Analysis methods split the task into (at least) two subtasks, 
the prediction of the task’s behavior on the processor components such as caches 
and pipelines, formerly called “micro-architecture modeling” [HBW94], and the 
determination of a worst-case path. They use Abstract Interpretation to predict 

* Work reported herein is supported by the Transregional Collaborative Research Cen- 
ter AVACS of the Deutsche Forschungsgemeinschaft. 
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the system’s behavior on the underlying processor’s components and Integer Lin- 
ear Programming to determine a worst-case path through the program [LMW99]. 
A typical tool architecture is the one of aiT, the tool developed and marketed 
by Abslnt Angewandte Informatik in Saarbriicken, cf. Fig. 1. 





CFG Builder 



m 

Loop Trafo 






Fig. 1 . Architecture of the aiT WCET analysis tool 



The articles [FHL + 01,LTH02] report on WCET tool developments for com- 
plex processor architectures, namely the Motorola ColdFire 5307 and the Mo- 
torola PowerPC 755. These were the first fully covered complex processors. 

3 Timing Anomalies 

The architecture of WCET-tools and the precision of the results of WCET anal- 
yses strongly depend on the architecture of the employed processor [HLTW03]. 
Out-of-order execution and control speculation introduce interferences between 
processor components, e.g. caches, pipelines, and branch prediction units. These 
interferences forbid modular designs of WCET tools, which would execute the 
subtasks of WCET analysis consecutively. Instead, complex integrated designs 
are needed resulting in high demand for space and analysis time. 

In the following, several such properties of processor architectures are de- 
scribed. They cause the processor to display what is called Timing Anoma- 
lies [Lun02]. Timing anomalies are contra-intuitive influences of the (local) exe- 
cution time of one instruction on the (global) execution time of the whole pro- 
gram. The interaction of several processor features can interact in such a way 
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that a locally faster execution of an instruction can lead to a globally longer exe- 
cution time of the whole program. This is only the first case of a timing anomaly. 
The general case is the following. Different assumption about the processor’s ex- 
ecution state, e.g. the fact that the instruction is or is not in the instruction 
cache, will result in a difference AT\ of the execution time of the instruction 
between these two cases. Either assumption may lead to a difference AT of the 
global execution time compared to the other one. We say that a timing anomaly 
occurs if either 

Z\Tj < 0 i.e., the instruction executes faster, and 

AT < ATi , the overall execution is accelerated by more than the accelera- 
tion of the instruction, or 
AT > 0 , the program runs longer than before. 

AT\ > 0 i.e., the instruction takes longer to execute, and 

AT > AT-[ i.e., the overall execution is extended by more than the delay of 
the instruction, or 

AT < 0 i.e., the overall execution is the program takes less time to execute 
than before. 

The case AT\ < 0 A AT > 0 is a critical case for WCET analysis. It makes 
it impossible to use local worst case scenarios for WCET computation. This 
necessitates a conservative, i.e., upper approximation to the damages potentially 
caused by all cases or forces the analysis to follow all possible scenarios. 

Unfortunately, as [LS99,Lun02] have observed, the worst case penalties im- 
posed by a timing anomaly may not be bounded by an architecture-dependent, 
but program-independent constant, but may depend on the program size. This 
is the so-called Domino Effect. This domino effect was shown to exist for the 
Motorola PowerPC 755 in [Sch03] . 

4 Formal Analysis of Processor Timing Models 

The abstract-interpretation-based timing analysis is based on abstract processor 
models that are conservative with respect to the timing behavior of the concrete 
processors. To prove this is a major endeavor to be undertaken in the Transre- 
gional Collaborative Research Center AVACS. Another line of research is the 
derivation of processor timing models from formal specifications in VHDL or 
Verilog. 

We are currently applying formal analysis of timing models to check for rele- 
vant properties, e.g., use model checking to detect timing anomalies and domino 
effects or their absence, resp. Bounded model checking can be used to check for 
the existence of upper bounds on the damage done by one processor component 
onto the state of another one, e.g. the damage of a branch misprediction to the 
instruction cache by loading superfluous instructions. The bound can be com- 
puted from architectural parameters, such as the depth of the pipeline and the 
length of prefetch queues. 
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Acknowledgements. Thanks go to Stephan Thesing for clarifications about 
timing anomalies. 
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Abstract. Explicit model checking algorithms explore the full state 
space of a system. We have gathered a large collection of state spaces and 
performed an extensive study of their structural properties. The results 
show that state spaces have several typical properties and that they dif- 
fer significantly from both random graphs and regular graphs. We point 
out how to exploit these typical properties in practical model checking 
algorithms. 



1 Introduction 

Model checking is an automatic method for formal verification of systems. In this 
paper we focus on explicit model checking which is the state-of-the-art approach 
to verification of asynchronous models (particularly protocols). This approach 
explicitly builds the full state space of the model (also called Kripke structure, 
occurrence or reachability graph). The state space represents all (reachable) 
states of the system and transitions among them. The state space is used to check 
specifications expressed in a suitable temporal logic. The main obstacle of model 
checking is state explosion — the size of the state space grows exponentially 
with the size of the model description. Hence, model checking has to deal with 
extremely large graphs. 

The classical model for large unstructured graphs is the random graph model 
of Erdos and Renyi [11]. In this model every pair of nodes is connected with an 
edge with a given probability p. Large graphs are studied in many diverse areas, 
such as social sciences (networks of acquaintances), biology (food webs, protein 
interaction networks), geography (river networks), and computer science (Inter- 
net traffic, world wide web). Recent extensive studies of these graphs revealed 
that they share many common structural properties and that these properties 
differ significantly from properties of random graphs. This observation led to the 
development of more accurate models for large graphs occurring in practice (e.g., 
‘small worlds’ and ‘scale-free networks’ models) and to a better understanding 
of processes in these networks. For example, it improved the understanding of 
the spread of diseases and vulnerability of computer networks to attacks; see 
Barabasi [2] and Watts [32] for a high-level overview of this research and further 
references. 

* Supported by GA CR grant no. 201/03/0509 
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In model checking, we usually treat state spaces as arbitrary graphs. However, 
since state spaces are generated from short descriptions, it is clear that they have 
some special properties. This line of thought leads to the following questions: 

1. What do state spaces have in common? What are their typical properties? 

2. Can state spaces be modeled by random graphs or by some class of regular 
graphs in a satisfactory manner? 

3. Can we exploit these typical properties to traverse or model check a state 
space more efficiently? Or at least to better analyze complexity of algo- 
rithms? Can some information about a state space be of any use to the user 
or to the developer of a model checker? 

4. Is there any significant difference between toy academical models and real 
life case studies? Are state spaces similar to such an extent that it does not 
matter which models we choose for benchmarking our algorithms? 

Methodology. The basic approach is the following: we measure many graph 
parameters of a large collection of state spaces and try to draw answers from the 
results. We restrict ourselves to asynchronous models, because these are typically 
investigated by explicit model checkers. We consider neither labels on edges nor 
atomic propositions in states and thus we focus only on structural properties of 
graphs. For generating state spaces we have used four well-known model checkers 
(SPIN [22], CADP [14], Murphi [10], fiCR [16]L) and two experimental model 
checkers. In all, we have used 55 different models including many large case 
studies (see Appendix A). In this report we summarize our observations, point 
out possible applications, and try to outline some answers. The project’s web 
page [1] contains more details about investigated state spaces and the way in 
which they were generated. Moreover, interested reader can find on the web page 
all state spaces in a simple textual format together with a detailed report for 
each of them, summary tables for each measured parameter, and more summary 
statistics and figures. 

Related work. Many authors point out the importance of the study of models 
occurring in practice (e.g., [13]). But to the best of our knowledge, there has been 
no systematic work in this direction. In many articles one can find remarks and 
observation concerning typical values of individual parameters, e.g., diameter [5, 
28], back level edges [31,3], degree, stack depth [20]. Some authors make implicit 
assumptions about the structure of state spaces [7,23] or claim that the usefulness 
of their approach is based on characteristics of state spaces without actually 
identifying these characteristics [30]. Groote and van Ham [17] try to visualize 
large state spaces with the goal of providing the user with better insight into a 
model. 

Organization of the paper. Section 2 describes studied parameters, results of 
measurements, their analysis, and possible application. Section 3 compares dif- 
ferent classes of state spaces. An impatient reader can jump directly to Section 4 
where our observations are summarized and where we provide some answers. Fi- 
nally, the last section outlines several new questions for future research. 
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2 Parameters of State Spaces 

A state space is a relational structure which represents the behavior of a system 
(program, protocol, chip, ... ). It represents all possible states of the system and 
transitions between them. Thus we can view a state space as a simple directed 
graph 1 G = (V, E, v 0 ) with a set of vertices V, a set of directed edges E C V x V, 
and a distinguished initial vertex vq. Moreover, we suppose that all vertices are 
reachable from the initial one. In the following we use graph when talking about 
generic notions and state space when talking about notions which are specific to 
state spaces of asynchronous models. 



2.1 Degrees 

Oat-degree ( in-degree ) of a vertex is the number of edges leading from (to) this 
vertex. Average degree is just |£'|/|P|. The basic observation is that the average 
degree is very small - typically around 3 (Fig. 1). Maximal in-(out-)degree is 
often several times higher than the average degree but with respect to the size of 
the state space it is small as well. Hence state spaces do not contain any ‘hubs’. 
In this respect state spaces are similar to random graphs, which have Poisson 
distribution of degrees. On the other hand, scale free networks discussed in the 
introduction are characterized by the power-law distribution of degrees and the 
existence of hubs is a typical feature of such networks [2] . 

The fact that state spaces are sparse is not surprising and was observed long 
ago 2 . It can be quite easily explained: the degree corresponds to a ‘branching 
factor’ of a state; the branching is due to parallel components of the model 
and due to the inner nondeterminism of components; and both of these are 
usually very small. In fact, it seems reasonable to claim that in practice \E\ £ 
0(|F|). Nevertheless, the sparseness is usually not taken into account either 
in the construction of model checking algorithms or in the analysis of their 
complexity. 

In many cases the average degree is even smaller than two, since there are 
many vertices with degree one. This observation can be used for saving some 
memory during the state space traversal [4]. 



2.2 Strongly Connected Components 

A strongly connected component (SCC) of G is a maximal set of states C C V 
such that for each u,v £ C, the vertex v is reachable from u and vice versa. The 
quotient graph of G is a graph (IF, H) such that W is the set of the SCCs of G 
and (Ci, C2) £ IT if and only if Ci 7^ C2 and there exist r £ Ci, s £ C -2 such that 
(r, s) £ E. The SCC quotient height of the graph G is the length of the longest 

1 We consider state spaces as simple graphs, i.e., we do not consider self-loops and 
multiedges. Although these may be significant for model checking temporal logics, 
they are not that important for the structural properties we consider here. 

2 Holzman [20] gives an estimate 2 for average degree. 
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Avg. degree Max. out-degree Max. in-degree 



Fig. 1 . Degree statistics. Values are displayed with the boxplot method. The upper 
and lower lines are maximum and minimum values, the middle line is a median, the 
other two are quartiles. Circles mark outliers. Note the logarithmic j/-axis. 



path in the quotient graph of G. A component is trivial if it contains only one 
vertex. Finally, a component is terminal if it has no successor in the quotient 
graph. 

For state spaces, the height of the SCC quotient graph is small. In all but one 
case it is smaller than 200, in 70% of cases it is smaller than 50. The structure 
of quotient graph has one of the following types: 

— there is only one SCC component (18% of cases), 

— there are only trivial components (the graph is acyclic) (14% of cases), 

— there is one large component which contains most states; the largest com- 
ponent is usually terminal and often it is even the only terminal. 

The number of SCCs can be very high, but this is mainly due to trivial com- 
ponents. The conclusion is that most states lie either in the largest component 
or in some trivial component and that the largest component tends to be ‘at the 
bottom’ of the SCC quotient graph. 

SCCs play an important role in many model checking algorithms and the 
above stated observation can be quite significant with respect to practical ap- 
plicability of some approaches, for example: 

— The runtime of symbolic SCC decomposition algorithms [12,25] depends very 
much on the structure of the SCC quotient graph. The thorough analysis [27] 
shows that the complexity of these algorithms depends on the SCC quotient 
height, the number of SCC, and the number of nontrivial SCC. We note that 
symbolic algorithms are usually used for synchronous models (whereas our 
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state spaces correspond to asynchronous ones) and thus our observations are 
not directly applicable here. However, the distributed explicit cycle detection 
algorithm [6] has complexity proportional to the SCC quotient height as well. 

— The existence of one large component shows the limited applicability of 
some algorithms. The ‘sweep line’ method [8] of the state space exploration 
saves memory by deleting states which are in a fully explored SCC. The 
distributed cycle detection based on partitioning the state space with respect 
to SCCs [23] assigns to each computer in a network one or more SCCs. 

— On the other hand, some algorithms could be simpler for state spaces which 
have one big component. For example during random walk there is a little 
danger that the algorithm will stuck in some small component. 



2.3 Properties of Breadth-First and Depth-First Search 

The basic model checking procedure is a reachability analysis - searching a state 
space for an error state. Here we consider two basic methods for state space 
traversal and their properties. 



Breadth-First Search (BFS). Let us consider the BFS from the initial ver- 
tex vq. A level of the BFS with an index A: is a set of states with distance from 
i> 0 equal to k. The BFS height is the largest index of a non-empty level. An edge 
(it, v) is a back level edge if v belongs to a level with a lower or the same index 
as u. The length of a back level edge is the difference between the indices of the 
two levels. 

— The BFS height is small (Fig. 2). There is no clear correlation between the 
state space size and the BFS height. It depends rather on the type of the 
model. 

— The sizes of levels follow a typical pattern. If we plot the number of states 
on a level against the index of a level we get a BFS level graph 3 . See Fig. 3 
for several such graphs. Usually this graph has a ‘bell-like’ shape. 

— The relative number of back level edges is (rather uniformly) distributed 
between 0% and 50%. Most edges are local — they connect two close levels 
(as already observed by Tronci et al. [31]). However, for most models there 
exist some long back level edges. For exact results and statistics see [1] . 

— For most systems we observe that there are only a few typical lengths of 
back level edges and that most back level edges have these lengths. This is 
probably caused by the fact that back level edges correspond to jumps in 
the model. There are typically only a reasonably small number of different 
jumps in a model. 



3 Note that the word ‘graph’ is overloaded here. In this context we mean graph in the 
functional sense. 
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100 1000 10000 100000 1e+06 

Number of vertices 

Fig. 2. The BFS height plotted against the size of the state space. Note the logarithmic 
a;-axis. Three examples have height larger than 300. 



— Tronci et al. [31,24] exploit locality of edges for state space caching. Their 
technique stores only recently visited states. The efficiency (and termination) 
of their algorithm rely on the fact that most edges are local and hence target 
states of edges are usually in the cache. In a similar way, one could exploit 
typical lengths of back level edges or try to estimate the maximal length of 
a back level edge and use this estimate as a key for removing states from the 
cache. 

— The algorithm for distributed cycle detection by Barnat et al. [3] has com- 
plexity proportional to the number of back level edges. 

— The typical shape of the BFS level graph can be exploited for a prediction 
of the size of a state space. Particularly, when a model checker runs out of 
memory it may be useful to see the BFS level graph — it can help the user 
to decide, whether it will be sufficient just to use a more powerful computer 
(or a distributed computation on several computers) or whether the model is 
hopelessly big and it is necessary to use some reduction and/or abstraction. 
This is easy to implement (and add to existing model checkers) and in our 
experience it can be very useful to the user. 

Depth-First Search (DFS). Next we consider the depth-first search from the 
initial vertex. The behavior of DFS (but not the completeness) depends on the 
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order in which successors of each vertex are visited. Therefore we have considered 
several runs of DFS with different orderings of successors. 

If we plot the size of the stack during DFS we get a stack graph. Fig. 4 shows 
several stack graphs (for more graphs see [1]). The interesting observation is 
that the shape of the graph does not depend much on the ordering of successors. 
The stack graph changes a bit of course, but the overall appearance remains 
the same. Moreover, each state space has its own typical graph. In contrast, all 
random graphs have rather the same, smooth stack graph. 

When we count the length of cycles encountered during DFS we find out 
that there are several typical lengths of cycles which occur very often; after the 
observation of the typical lengths of back level edges this does not come as a 
great surprise. 

These observations point out interesting structural properties of state spaces 
(and their differences from random graphs) but do not seem to have many direct 
applications. The only one is the stack cycling technique [21] which exploits the 
fact that the size of the stack does not change too quickly and stores part of the 
stack on the magnetic disc. Stack graphs could provide better insight into how 
to manage this process. 




0 50000 100000 150000 200000 250000 300000 350000 400000 0 10000 20000 30000 40000 50000 60000 70000 80000 



Fig. 4. Stack graphs. The first one is the stack graph of a very simple model. Stack 
graphs of random graphs are similar to this one. The other three stack graphs corre- 
spond to more complex models. 
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Queue and Stack Size. For implementations of the breadth- and depth-first 
search one uses queue and stack data structures. These data structures are in 
most model checkers treated differently from a set of already visited states. This 
set (usually implemented as a hash table) is considered to be the main memory 
consumer. Therefore its size is reduced using sophisticated techniques: states are 
compressed with lossless compression [19] or bit-state hashing [18], stored on 
magnetic disc [29], or only some states are stored [4,15]. On the other hand, the 
whole queue/stack is typically kept in memory without any compression. Our 
measurements show that the sizes of these structures are often as much as 10% of 
the size of a state space; see Fig. 5 for results and comparison of queue and stack 
sizes. Thus it may happen that the applicability of a model checker becomes 
limited by the size of a queue/stack data structure. Therefore it is important 
to pay attention to these structures when engineering a model checker. We note 
that this is already done is some model checkers - SPIN can store part of a 
stack on disc [21], UPPAAL stores all states in the hash table and maintains 
only references in a queue/stack [9]. 




max. stack 

Fig. 5. A comparison of maximal queue and stack sizes expressed as percentages of the 
state space size. Note that the relative size of a queue is always smaller than 40% of 
the state space size whereas the relative size of a stack can go up to 90% of the state 
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2.4 Distances 

The diameter of a graph is the length of the largest shortest path between two 
vertices. The girth of a graph is the length of the shortest cycle. Since diameter 
and girth are expensive to compute 4 we can determine them only for small state 
spaces. 

However, experiments for small graphs reveal that we can compute good 
estimates of these parameters with the use of the breadth- and depth-first search. 
The BFS height can be used to estimate the diameter. For most state spaces 
the diameter is smaller than 1.5 times the BFS height. Note that for general 
graphs the diameter can be much larger than the BFS height. DFS can be used 
to estimate the girth - it is not guaranteed to find the shortest cycle but our 
experience shows that in practice it does. 

It is a ‘common belief’ (only partially supported in some papers) that the 
diameter is small. Our experiments confirm this belief. In most cases the diameter 
is smaller than 200, often much smaller 5 . The girth is in most cases smaller 
than 8. 

The fact that the diameter is small is practically very important. Several al- 
gorithms (e.g., [28,12,25]) and the bounded model checking approach [5] directly 
exploit this fact. Moreover, the fact that the diameter is small suggests that 
many of the very long counterexamples (as produced by some model checkers) 
are caused by a poor search and not by the inherent complexity of the bug. 



2.5 Local Structure 

As the next step we try to analyze the local structure of state spaces. In order 
to do so, we employ some ideas from the analysis of social networks. A typical 
characteristics of social networks is clustering — two friends of one person are 
friends together with much higher probability than two randomly picked persons. 
Thus vertices have a tendency to form clusters. This is a significant feature which 
distinguishes social networks from random graphs. 

In state spaces we can expect some form of clustering as well — two successors 
of a state are more probable to have some close common successor than two 
randomly picked states. Specifically, state spaces are well-known to contain many 
‘diamonds’. We try to formalize these ideas and provide some experimental base 
for them. 

— A diamond rooted at tq is a quadruple (iq, iq, f 3 , tq) such that {(fi.tq), 
K,«3),(^2,^),(V3,Al)} C E. 

— The k-neighborhood of v is the set of vertices with distance from v smaller 
or equal to k. 

4 In the context of large state spaces even quadratic algorithms are expensive. 

5 Diameters of state spaces are very small with respect to their size and to the theo- 
retical worst case. But compared to random graphs or ‘small world’ networks it is 
still rather large (the diameter of these graphs is proportional to the logarithm of 
their size). 




Typical Structural Properties of State Spaces 



15 



— The k- clustering coefficient of a vertex v is the ratio of the number of edges 
to the number of vertices in the /c-neighborhood (not counting the v itself). If 
the clustering coefficient is equal to 1, no vertex in the neighborhood has two 
incoming edges within this neighborhood. A higher coefficient implies that 
there are several paths to some vertices within the neighborhood. Random 
graphs have clustering coefficients close to 1. 

The measurements confirm that the local structure of state spaces signifi- 
cantly differ from random graphs (see [1] for more details): 

— The size of neighborhood grows much more slowly for state spaces than for 
random graphs (Fig. 6). This is because the clustering coefficient of state 
spaces increases (rather linearly) with average degree. 

— Diamonds display an interesting dependence on the average degree. For a 
state space with average degree less than two there are a small number of 
diamonds. For state spaces with average degree larger than two there are a 
lot of them. 

— Although girth is small for all state spaces, short cycles are abundant only 
in some graphs — only one third of state spaces have many short cycles. 

— The local structure is similar in all parts of a state space. 
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Fig. 6. Relationship between the size of 4-neighborhood and the average degree, and 
a comparison with random graphs. 
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The bottom line of these observations is that the local structure depends 
very much on the average degree. If the average degree is small then the local 
structure of the state space is tree-like (without diamonds and short cycles, with 
many states of degree one). Whereas with the high average degree it has many 
diamonds and high clustering coefficient. The rather surprising consequence is 
that the local structure depends on the model only in as much as the average 
degree does. 

This is just the first step in understanding the local structure of state spaces, 
so it is difficult to give any specific applications. Some of these properties could 
be exploited by traversal methods which do not store all states [4]. Since the 
size of a neighborhood grows rather slowly, it might be feasible to do some kind 
of ‘look-ahead’ during the exploration of a state space (this is not the case for 
arbitrary graphs). 



3 Comparisons 

3.1 Specification Languages and Tools 

Most parameters seem to be independent of the specification language used 
for modeling and the tool used for generating a state space. In fact, the same 
protocols modeled in different languages yield very similar state spaces. This can 
be seen as an encouraging result since it says that it is fair to do experimental 
work with just one model checker. 

We have noticed some small differences. For example, Promela models often 
have sparser state spaces. But because we do not have the same set of examples 
modeled in all specification languages, we cannot fully support these observations 
at the moment. 



3.2 Toy versus Industrial Examples 

We have manually classified examples into three categories: toy (16), simple (25), 
and complex (14) (see Appendix A). The major criterion for the classification 
was the length of the model description. The comparison shows differences in 
most parameters. Here we only briefly summarize the main trends; more detailed 
figures can be found on the project’s web page [1]: 

— The average degree is smaller for state spaces of complex models. This is 
important because the average degree has a strong correlation with the local 
structure of the state space (see Section 2.5). 

— The maximal size of the stack during DFS is significantly shorter for complex 
models (Fig. 7). 

— The BFS height and the diameter are larger for state spaces of complex 
models. 

— The number of back level edges is smaller for state spaces of complex models 
but they have longer back level edges. 
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Max. stack size 




complex simple toy 



Fig. 7. The maximal stack size (given in percents of the state space size) during DFS. 
Results are displayed with the boxplot method (see Fig. 1 for explanation). 



— Global structure is more regular for state spaces of toy models. This 
is demonstrated by BFS level graphs and stack graphs which are much 
smoother for state spaces of toy models. 

These results stress the importance of having complex case studies in model 
checking benchmarks. Particularly experiments comparing explicit and symbolic 
methods are often done on toy examples. Since toy examples have more regular 
state spaces, they can be more easily represented symbolically. 



3.3 Similar Models 

We also compared the state spaces of similar models — parametrized models 
with different values, abstracted models, models with small syntactic change. 
Moreover, we have compared full state spaces and state spaces reduced with 
partial order reduction and strong bisimulation. 

The resulting state spaces are very similar — most parameters are (nearly) 
the same or are appropriately scaled with respect to the change in the size 
of the state space. The exception is that a small syntactic change in a model 
can sometimes produce a big change of the state space. This occurs mainly 
in cases where the small change corresponds to some error (or correction) in 
the model. This suggests that listing state space’s parameters could be useful 
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for users during modeling — the significant change of parameters between two 
consecutive versions of a model can serve as a warning of a potential error (this 
can be even done automatically). 

4 Conclusions: Answers 

Although we have done our measurements on a restricted sample of state spaces, 
we believe that it is possible to draw general conclusions from the results. We 
used several different model checkers and models were written by several dif- 
ferent users. Results of measurements are consistent — there are no significant 
exceptions from reported observations. 

What are typical properties of state spaces? 

State spaces are usually sparse, without hubs, with one large SCC, with small 
diameter and small SCC quotient height, with many diamond-like structures. 

These properties can not be explained theoretically. It is not difficult to 
construct artificial models without these features. This means that observed 
properties of state spaces are not the result of the way state spaces are generated 
nor of some features of specification languages but rather of the way humans 
design/model systems. 

Can state spaces be modeled by random graphs or by some class of regular graphs? 

State spaces are neither random nor regular. They have some internal struc- 
ture, but this structure is not strictly regular. This is illustrated by many of our 
observations: 

— local clustering (including diamonds) is completely absent in random graphs 

— stack graphs and BFS level graphs are quite structured and ragged, while 
for both regular and random graphs they are much smoother 

— typical values of lengths of back level edges and cycles 

— the diameter is larger than for random graphs but small compared to the size 
of the state space (definitely much smaller than for common regular graphs) 

Can we exploit typical properties during model checking? 

Typical properties can be useful in many different ways. Throughout the pa- 
per we provide several pointers to work that exploits typical values of parameters 
and we give some more suggestions about how to exploit them. 

Values of parameters are not very useful for non-expert users who are not 
usually aware of what a state space is, but they may be useful for advanced 
users of the model checker. Properties of the underlying state space can provide 
users with feedback on their modeling and sanity checks — users can confront 
obtained parameters with their intuition (particularly useful for SCCs) and com- 
pare parameters of similar models, e.g., original and modified model. 

The parameter values can be definitively useful for developers of tools, par- 
ticularly for researchers developing new algorithms — they can help to explain 
the behavior of new algorithms. 
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Are there any differences between toy and complex models? 

Although state spaces share some properties in common, some can signifi- 
cantly differ. Behavior of some algorithms can be very dependent on the structure 
of the state space. We can illustrate it on an experiment with random walk. We 
have performed series of very simple experiments with random walks on gener- 
ated state spaces. For some graphs one can quickly cover 90% of the state space 
by random walk, whereas for other we were not able to get beyond 3%. So it 
is really important to test algorithms on a large number of models before one 
draws any conclusions. 

Particularly, there is a significant difference between state spaces correspond- 
ing to complex and toy models. Moreover, we have pointed out that state spaces 
of similar models are very similar. We conclude that it is not adequate to per- 
form experiments just on few instancies of some toy example 6 and we support 
calls for a robust set of benchmark examples for model checking [13]. 

5 Future Work: New Questions 

— What more can we say about state spaces when we consider atomic proposi- 
tions in states (respectively good/bad states)? What is the typical distribu- 
tion of good/bad states? How many are there? Where are they? What are 
the properties of product graphs used in LTL model checking (product with 
Buchi automaton) and branching time logic model checking (game graphs)? 
Do they have the same properties or are there any significant differences? 

— Can we estimate structural properties of a state space from static analysis 
of its model? 

— In this work we consider mainly ‘static’ properties of state spaces. We have 
briefly mentioned only the breadth- and depth-first search, but there are 
many other possible searches and processes over state spaces (particularly 
random walk and partial searches). What is the ‘dynamics’ of state spaces? 

— What is the effect of efficient modeling [26] on the resulting state space? 

— State spaces are quite structured and regular. But how can we capture this 
regularity exactly? How can we employ this regularity during model check- 
ing? Can the understanding of the local structure help us to devise symbolic 
methods for asynchronous models? 
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A Models 



Tool 


Model 


Type 


Size 


Murphi 


Peterson’s mutual exclusion algorithm 


toy 


882 


Murphi 


Parallel sorting 


toy 


3,000 


Murphi 


Hardware arbiter 


simple 


1,778 


Murphi 


Distributed quering lock 


simple 


7,597 


Murphi 


Needham-Schroeder protocol 


complex 


980 


Murphi 


Dash protocol 


complex 


1,694 


Murphi 


Cache coherence protocol 


complex 


15,703 


Murphi 


Scalable coherent interface (SCI) 


complex 


38,034 
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Tool 


Model 


Type 


Size 


SPIN 


Peterson protocol for 3 processes 


toy 


30,432 


SPIN 


Dining philosophers 


toy 


54,049 


SPIN 


Concurrent sorting 


toy 


107,728 


SPIN 


Alternating bit protocol 


simple 


442 


SPIN 


Readers, writers 


simple 


936 


SPIN 


Token ring 


simple 


7,744 


SPIN 


Snooping cache algorithm 


simple 


14,361 


SPIN 


Leader election in unidirectional ring 


simple 


15,791 


SPIN 


Go-back-N sliding window protocol 


simple 


35,861 


SPIN 


Cambridge ring protocol 


simple 


162,530 


SPIN 


Model of cell-phone handoff strategy 


simple 


225,670 


SPIN 


Bounded retransmition protocol 


simple 


391,312 


SPIN 


ITU-T multipoint communication service 


complex 


5,904 


SPIN 


Flight guidance system 


complex 


57,786 


SPIN 


Flow control layer validation 


complex 


137,897 


SPIN 


Needham-Schroeder public key protocol 


complex 


307,218 


CADP 


Alternating bit 


simple 


270 


CADP 


HAVi leader election protocol 


simple 


5,107 


CADP 


INRES protocol 


simple 


7,887 


CADP 


Invoicing case study 


simple 


16,110 


CADP 


Car overtaking protocol 


simple 


56,482 


CADP 


Philips’ bounded retransmission protocol 


simple 


60,381 


CADP 


Directory-based cache coherency protocol 


simple 


70,643 


CADP 


Reliable multicast protocol 


simple 


113,590 


CADP 


Cluster file system 


complex 


11,031 


CADP 


C04 protocol for distributed knowledge bases 


complex 


25,496 


CADP 


IEEE 1394 high performance serial bus 


complex 


43,172 


MCRL 


Chatbox 


toy 


65,536 


AtCRL 


Onebit sliding window protocol 


simple 


319,732 


/rCRL 


Modular hef system 


complex 


15,349 


/iCRL 


Link layer protocol of the IEEE-1394 


complex 


371,804 


MCRL 


Distributed lift system 


complex 


129,849 


Divine 


Cabbage, goat, wolf puzzle 


toy 


52 


Divine 


Dining philosophers 


toy 


728 


Divine 


MSMIE protocol 


simple 


1,241 


Divine 


Bounded retransmition protocol 


simple 


6,093 


Divine 


Alternating bit protocol 


simple 


11,268 


MASO 


Aquarium example 


toy 


6,561 


MASO 


Token ring 


toy 


7680 


MASO 


Alternating bit protocol 


toy 


11,268 


MASO 


Adding puzzle 


toy 


56,561 


MASO 


Elevator 


simple 


643,298 
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Abstract. State caching makes the full exploration of large state spaces 
possible by storing only a subset of the reachable states. While memory 
requirements are limited, the time consumption can increase dramati- 
cally if the subset is too small. It is often claimed that state caching is 
effective when the cache is larger than between 33% and 50% of the total 
state space, and that random replacement of cached states is the best 
strategy. Both these ideas are re-investigated in this paper. In addition, 
the paper introduces a new technique, stratified caching, that reduces 
time consumption by placing an upper bound on the extra work caused 
by state caching. This, and a variety of other strategies are evaluated 
for random graphs and graphs based on actual verification models. Mea- 
surements made with Spin are presented. 



1 Introduction 

Model checking by explicit state enumeration has become increasingly successful 
in the last decade, but suffers from the well-known state space explosion prob- 
lem. Techniques for palliating the problem abound, but once the size of a model 
crosses a certain line, the only hope is a partial search of the state space. While 
such probabilistic approaches are valuable in detecting violations of the specifi- 
cation, they cannot guarantee correctness; when possible, a full exploration of 
the state space is, of course, preferable. 

This paper focuses on state caching , one of the earliest techniques to deal 
with state space explosion. During state space exploration the reached states 
are stored in a hash table or similar data structure. When a previously visited 
state is reached again, it is found among the stored states and does not have to 
be re-explored. When a new state is generated and the state store is full (i.e., 
available memory has been exhausted), an already stored state is selected and 
discarded to make room for the new state. By replacing an old state, the model 
checker commits itself to re-investigate the state, should it be reached again. 
This may entail re-doing previous work, but full exploration of the state space 
is guaranteed. 

While this approach makes effective use of the available memory, the running 
time may increase dramatically if too few states are cached. The received wisdom 
is that state caching is effective when the cache is larger than somewhere between 
33% and 50% of the total state space, and that random replacement of states 
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is the best strategy. These ideas are re-investigated and found to be somewhat 
misleading. 

We propose a new replacement strategy called stratified caching. Broadly 
speaking, it places an upper limit on the amount of redundant work that is 
performed when an already visited but replaced state is reached again. It does 
this by only replacing states at predetermined depths, or strata , hence the name. 

Section 2 examines how state caching has been presented in the literature, 
and in Section 3 stratified caching is introduced and discussed. Section 4 eval- 
uates a range of caching strategies for both random and actual state graphs, 
some initial results for a single model are reported in Section 5, and, finally, 
conclusions are presented in Section 6. 



2 State Space Exploration and State Caching 

During state space exploration the reached states are stored in a table. When 
a previously visited state is reached again, it is found in the table and does not 
have to be re-explored. When a new state is generated and the table is full (i.e., 
the available memory has been exhausted), there are three possibilities: 

A. Abandon the search, explaining that the memory has been exhausted. 

B. Discard the new state and pretend that it has been seen before. 

C. Select an old state and replace it with the new state. 

It is not immediately clear why possibility A would ever be preferred over B 
or C. Of course, it is important to notify the user that the memory has been ex- 
hausted: she may wish to interrupt the exploration and investigate alternatives. 
It may be that the model is faulty in some way and that the memory exhaustion 
is a symptom of this. Or the user may wish to investigate a simplified model, al- 
ternative options for state exploration or other reduction strategies. But it seems 
sensible to always continue the search with either possibility B or C. However, 
the implementation of possibility B or C may require memory and time over- 
heads which the user wishes to avoid in the “default” operating mode of the state 
space exploration tool. Memory considerations are usually not critical since, as 
we shall see, state caching works well even if the cache is slightly smaller than 
the state space. In other words, the memory overhead may not make any real 
difference to the user. Time overhead is another issue and it is difficult to say, 
in general, whether it is wise to use or not use state caching as a default. 

Possibility B results in a partial exploration of the state space, and issues such 
as omission probability and coverage come into play. In this case, the search may 
yield false positives, claiming that the model satisfies the correctness specification 
when, in fact, it does not, while all violations of the correctness specification are 
valid. As indicated in the introduction, this paper looks only at full exploration. 

Possibility C is known as state caching, the focus of this paper. By replacing 
the old state, the model checker commits itself to reinvestigate the old state 
should it be reached again. Although this may redo work that has been done 
before, it does not lead to incorrect results. 
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State caching works well when the cache is slightly smaller than the state 
space. If relatively few states are replaced, the probability of revisiting a replaced 
state is low and even if it happens, the state’s successors will probably be found 
in the cache. As the cache grows smaller and smaller compared to the state 
space, the re-exploration of states grows more and more frequent, and when the 
cache is very small, the problem of redundant work becomes severe. Under these 
conditions, the probability of revisiting a replaced state, and having to revisit its 
successors and their successors is high. Furthermore, each state that is revisited 
is re-inserted in the cache (since the depth-first algorithm cannot tell that it is 
not a new state) and displaces another state in the cache, in this way making 
matters even worse. 

Depth-first search is guaranteed to terminate when state caching is used as 
long as states on the depth- first search path are never replaced, and the cache is 
large enough. Because states on the depth-first search path cannot be replaced, 
it is possible that too small a cache can eventually “fill up” with such states, 
in which case the search cannot proceed and must terminate early. (For general 
graphs, state caching does not work at all for breadth-first search, and offers a 
limited improvement in performance for mixed depth- and breadth-first search 
strategies; in the rest of the paper only depth-first search is considered.) 



2.1 State Caching in the Literature 

The first discussion of state caching for model checking is by Holzmann [10]. As 
far as we are aware, there have been three sets of papers that report significant 
results about state caching. 

• In [10] — which is an overview of a number of verification techniques and not 
an in-depth discussion of state caching — the author investigates state caching 
for a single model of 150000 states. States are replaced using “simple blind round 
robin selection” , but it is not clear what data structure is used to store the states 
and whether all states are considered equally in the evaluation of this criterion. 
The conclusion of the paper is that a cache of roughly half the size of the state 
space can still provide acceptable performance. 

In a later paper [11] (published after but written before [10]) Holzmann 
investigates five different replacement strategies as implemented in the trace 
tool, a forerunner of Spin [12]. The strategies are based on replacing 

HI. most frequently visited states; 

H2. least frequently visited states; 

H3. states in the currently largest class of states, where the class of a state is 
defined by the number of times it has been visited; 

H4. oldest states (i.e., those states that have been in the cache longest); and 
H5. states in the bottom half of the current search tree. 

As before [10], no details about how states are stored, are given. This is 
significant, because the data structure affects the behaviour of the state cache 
with respect to redundant work, memory and time consumption. The strategies 
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are also not clearly defined: there is no indication of how to choose between 
possible candidate states in strategies HI and H2, and the “bottom half” in 
strategy H5 can be interpreted in a number of different ways. 

The strategies are investigated “for a range of medium sized protocols” . Two 
examples are presented, one with 4523 and the other with 8139 unique states. 
The paper concludes that strategy H4 is consistently the fastest (resulting in 
a tolerable increase in running time) even though in one of the examples it 
performs the most unnecessary work by far (59% “double work” compared to a 
maximum of 0.5% by the other strategies). No conclusions about the minimum 
size of the state cache are reached. Confusingly, many future references refer to 
H4 as “random replacement” , even though it is clearly not presented as such in 
the paper. Also, H5 is sometimes described as replacing the state corresponding 
to the smallest subtree of the depth-first tree; this may be its intention, but it 
is not how the strategy is defined in the paper. 

• Jard and Jeron investigate state caching in [13] and in [14,15]. Like the earlier 
work [10,11], these papers do not focus on state caching per se. In the last 
two works, the authors generate random graphs that are explored using depth- 
first search and, based on the earlier findings [11], a state cache with random 
replacement. They report that, in a typical case, a cache using 40% of the normal 
memory yields 70% more visited states and a 50% increase in running time. In 
the best case, the cache size is reduced to 10% of the state space with only a 1% 
increase in the number of visited states. 

• The effect of partial-order methods [8,18,21] on state caching is addressed 
by Godefroid, Holzmann, and Pirottin in [7]. Sleep sets [6] is a partial-order 
method that eliminates most of the interleaving of independent transitions with- 
out reducing the number of states. (The combination of sleep sets and persistent 
sets [8], which also reduces the number of states, is further investigated by Gode- 
froid [9].) In [6] state caching (without sleep sets) is first investigated for four 
models of real-world protocols using the Spin verifier [12]. The models have a 
transition/state ratio of roughly 3, and the authors report that the cache can be 
reduced to 33% to 50% the size of the state space. These findings confirm the 
general results of [10,11]. 

The details of the state caching can be determined fully because the authors 
have made their software publically available. States are stored in an open hash 
table with pointers to singly linked lists of states with the same hash value. A 
state is inserted by appending it to the linked list pointed to by its hash slot. 
After each insertion, a check is made to see if the state store is full. If so, an old 
state is selected and discarded to make room for the next insertion. Although 
the paper claims to use a random replacement strategy, it is clear from the 
code that this is not entirely accurate. The linked lists are scanned cyclically 
and within each list the state that has been in the cache longest, and therefore 
occurs towards the front of the list, is always chosen first. 

With sleep sets the performance of state caching improves dramatically. The 
transition/state ratio of one model decreases from 2.88 to 1.45 and a cache of 
about 25% the size of the state space suffices. For the other three models, the 
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use of sleeps sets reduces the ratio from 2.80, 3.54, and 2.58 to 1.12, 1.04, and 
1.04, respectively. For these, the cache performs spectacularly well: a cache size 
of only 0.2% to 3% the size of the state space suffices for a complete exploration 
of the models. Unfortunately, the running time increases by a factor of between 

2 and 4, but it is not clear whether this is caused by the implementation of the 
sleep sets or the state caching. 

2.2 Other Work Related to State Caching 

The performance of state caching using open addressing — also known as closed 
hashing — has also been investigated [5,19], but, due to space constraints, this 
approach is not considered here. Other techniques that may improve the ef- 
fectiveness of state caching have been suggested, sometimes in more general 
contexts. This includes the identification of states to replace preferentially [7, 
10], heuristic state space exploration [4], probabilistic caching of states [3], and 
precomputation to enhance replacement strategies [17]. 

3 Stratified Caching 

When a state cache contains only a few replaced states, the probability of revis- 
iting a deleted state is small, the probability that the state has a deleted child 
is smaller, the probability that the child also has a deleted child is smaller still, 
and so on. As the number of replacements grows, these probabilities increase to 
the point where practically every deleted state has at least one deleted child, 
and most states have several. Revisiting a deleted state in this situation leads 
to a considerable amount of redundant work; not only do large subtrees require 
re-exploration, but each revisited state is also re-inserted in the cache, pushing 
out yet another state, and causing a cascade of revisits. 

Stratified caching limits the redundant work by placing an upper bound on 
how much deeper than usual each branch of the depth-first tree needs to be 
explored, hereafter referred to as the extra depth. It does this by only replacing 
states at specified levels of the depth-first search tree. All states at a certain level 
form a “stratum” of states. 

It is generally not possible to know how deep the depth-first search will go 
and how many strata there will be, so strata are classified as “available” or 
“unavailable” for replacement based on their level modulo a certain number m. 
When a stratified caching strategy specifies that all strata of level k modulo m are 
available for replacement, this means that the states at levels k, k+m, k+2m, . . . 
may be replaced while the states of other strata must remain in the cache. 
Figure 1 shows available strata boxed in gray for the k = 1, m = 3 case. 

Assuming that the depth-first tree is deep enough, that the states of the state 
space are uniformly distributed over the different levels modulo in, and that the 
probability of a revisit is uniformly distributed over the states, the expected 
extra depth is at most 1/m and the maximum extra depth is 1. As the search 
progresses, an available state can be replaced by either another available state, 




28 



J. Geldenhuys 



0 

1 

2 

3 

4 

5 

6 
7 




Fig. 1 . Stratified caching for the k = 1, m = 3 case 



in which case the expected extra depth remains constant, or by an unavailable 
state, in which case the expected extra depth decreases slightly. Unfortunately, 
because at most 1/m of the states are available for replacement, the cache is 
quickly exhausted (i.e. , filled only with unreplaceable states) and the search 
must be aborted. Smaller values of m increase the number of available strata 
and therefore the fraction of replaceable states, but the minimum value for m is 
2, meaning that at most 1/2 of states can be available for replacement in this 
setting. Another way to increase the number of replaceable states is to increase 
the number of available strata within each modulo group. For example, if strata 
2 and 3 modulo 5 are available for replacement, the expected extra depth is at 
most 3/5 and the maximum extra depth is 2. 

In this paper, the following approach is taken: Initially, k = 1 and m = 2. 
Once the available states are exhausted, all the states in the odd (available) 
strata are states currently on the depth-first search stack. (If a state from an odd 
stratum were not on the stack, it would have been available for replacement, but 
the cache has been exhausted and there are no more replaceable states.) Instead 
of aborting the search at this point, the value of to is doubled to 4, and k is 
changed to 1 ... 3. This process may be repeated several times, as illustrated in 
Figure 2. After the nth doubling, m = 2 n+1 , k = 1 . . . m — 1, the expected extra 
depth is at most (to — l)/2 and the maximum extra depth is to — 1. 

An idea similar to stratified caching has been described in [2]. There the 
authors investigate several heuristics that indicate whether or not a particular 
state should be stored at all. In fact, that work focuses on the minimal set of 
states that need to be stored, in other words, heuristics for selecting a covering 
set of vertices. Storing all states and replacing selectively replacing some holds 
the obvious advantage of avoiding a potentially significant amount of work, but 
it is also true that, for some models, storing only some of the states can improve 
performance even further. 
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k = 1, m = 2 





k ■== 1 . . . 3, to = 4 




fc = 1 . . . 7, m = 8 



Fig. 2. Stratified caching with doubling modulo 



4 Experiments with Random and Real Graphs 

Our initial goal was to compare stratified caching and a random replacement 
strategy, but doubts arose about the optimality of random replacement. To re- 
solve the matter, we conducted a set of experiments that explore graphs using 
state caching and a variety of state replacement strategies. 

The replacement strategies are based on the following five state attributes: 
(E) stack entry time, (X) stack exit time, (D) search depth, (i) current indegree, 
and (0) current outdegree. A specific replacement strategy is a combination of 
attributes (denoted by uppercase letters) and negated attributes (denoted by 
lowercase letters). For example, the specification “DiX” indicates that states 
are first ordered by their ascending depth, then by their descending indegree, 
and finally by the ascending time of stack entry. When a state is selected for 
replacement, the least state according to this ordering is chosen. 

Only attributes E and X are unique to each state. Therefore, a specification 
such as “dE” is unambiguous, as there is only one deepest, least recently entered 
state, while the specification “Io” is ambiguous, as there can be many states 
with minimal indegree and maximal outdegree. Note that the order of attributes 
is important: “DIX” and “IOX” describe two different replacement strategies. 

In addition to the five attributes, two pseudo-attributes are used: (R) random 
selection, and (S) stratified caching. When the R pseudo-attribute is added to 
an ambiguous specification “s”, states for replacement are chosen by randomly 
selecting one of the eligible states specified by s, and the resulting specifica- 
tion “sR” is unambiguous. The S pseudo-attribute can be combined with any 
unambiguous specification “s” to yield a strategy that selects a state within 
the available strata according to s, and employs the doubling-modulo stratified 
caching strategy discussed in the last section. 

It is not easy to compare these strategies to those in [11]. Strategy HI is 
clearly something of the form “i. . but in [11] a complete scan of the state 
store is performed to find the most frequently visited state, and it is therefore 
influenced by the hash function. Likewise, H2 corresponds to something of the 
form “I. In [11] strategy H4 is said to select the oldest state in the cache, 
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which is what “E” does, but H4 is often referred to as random replacement, 
which corresponds to “R” . H3 and H5 have no direct counterparts among the 
strategies investigated. 

In total, 790 unambiguous specifications were investigated. The specifications 
are not only unambiguous, but also independent of the state storage scheme. 
(Note that it is not really possible to implement an ambiguous specification — 
the ambiguity must be resolved in some undisclosed way — and such specifica- 
tions were not considered.) 

The experiments in this section focus on the cache size and the amount of 
redundant work; memory and time consumption are not at issue here. Experience 
has shown that, if care is taken with the implementation, redundant work gives 
a good indication of the time consumption. Apart from empirical observation, 
there is also an intuitive argument to support this claim: The proportion of 
extra states explored brings about an equal proportion of extra transitions, and 
the number of transitions is the dominating factor in the execution time of a 
state exploration tool. Accurate calculation of the memory consumption of the 
different specifications is also possible, but, for lack of space, we do not discuss it 
here. It is important to note, however, that the optimality of the strategies should 
be judged on their memory consumption and not the amount of redundant work 
involved. 

4.1 Experiments with Random Graphs 

In the first set of experiments, random graphs were generated using the same 
method described in [14], and shown in Figure 3. Each random graph is deter- 
mined by three parameters: the desired number of states S, the maximum out- 
degree D of any node, and a seed R for the random number generator. For both 
random graph generation and random replacement strategies, the (2 1993 ' — 1)- 
period Mersenne Twister random number generator [16] is used. 

States are generated in a breadth-first fashion; for each new state an outde- 
gree d is chosen uniformly in the range 0 ... D; the probability that an outgoing 
transition leads to a new state is 0.5 for the first L<S/2J states and then decreases 
linearly until it reaches 0 when the number of states reaches S\ destination states 
of transitions that lead to old states are chosen uniformly from among the al- 
ready generated states. This algorithm may terminate before enough states have 
been generated, if, for example, an outdegree d = 0 is chosen for the initial state. 
It is therefore repeated until at least .9S states have been generated. 

The graphs generated in this way are called unweighted and have a transi- 
tion/state ratio of D/2. However, it is not only the average number but also 
the distribution of revisits that affect the performance of state caching. The 
algorithm was therefore adjusted to also generate weighted graphs: instead of 
choosing the revisited states by uniform selection (line 10 of Figure 3), each 
state was weighted by its number of incoming transitions. (The root state was 
given an additional incoming edge, since otherwise it would never be selected). 
Figure 4 shows the distribution of revisits for weighted and unweighted graphs; 
these match the distributions of many actual models. 
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GenerateGraph (.S', D, R ) 

1 random seed := R 

2 repeat 

3 Enqueue(<j, 0) ; n := 1 

4 while NotEmpty((/) 

5 .s DEQUEUE(g) 

6 d ■— Random(0 . . . D) 

7 for * 1 to d 

8 p := 1 — Max(0.5, n/S) 

9 if Random(0 . . . 1) > p { revisit an old state} 

10 t := Random(0 . . . n — 1) 

11 else {generate a new state} 

12 t := n ; n := n + 1 

13 Enqueue^, t) 

14 endif 

15 AddTransition(s, t) 

16 endfor 

17 endwhile 

18 until n > .9 S 



Fig. 3. Code for generation of random graphs 





Fig. 4. The distribution of state revisits for (a) the unweighted random graph S = 
5000, D = 6, R = 22222222, and (b) the weighted random graph S = 5000, D = 6, 
R = 55555555. In (a), for example, roughly 22% of states are visited only once, and 
roughly 26% of states are visited twice. 



Thirty unweighted and thirty weighted random graphs were generated using 
the parameter values S = 5000, D = 6,10,20, and R = llllllll?r for n = 
0, 1, ... 9. Different values of S had no discernible effect on the results, and 5000 
was chosen as a representative value. Each random graph was explored using 
each of the unambiguous specifications as a cache replacement strategy. For the 
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Cache size 

Fig. 5. The behaviour of the replacement strategy “iDx” (among the states with the 
highest indegree, select the shallowest states and, from these, choose the state that 
exited the depth-first stack first) on the unweighted random graph S = 5000, D = 
6, R = 0. The graph has 4505 states (shown as the horizontal dashed line), 13249 
transitions and a maximum search depth of 1658 (shown as the vertical dashed line). 



initial run, the cache size was the same as the state space size; during subsequent 
runs it was decremented in steps of 1+ L<S/400J, roughly 0.25% of the state space 
size. This process terminated once the number of visited states exceeded 10 times 
the size of the state space, or once the cache reported that it was full. 

A typical result is shown in Figure 5. The dot in the center shows, for example, 
that a cache of 48.3% the size of the state space produced 24975 reported visited 
states, while the actual number of states is only 4505. The reported/actual states 
ratio — which we call the redundant work factor — is therefore 5.54. The re- 
placement strategies are difficult to compare: minimum cache size is important, 
but so is the redundant work factor and the relationship between the two vari- 
ables. Moreover, the maximum tolerable redundant work factor is a subjective 
limit. 

Table 1 presents the results of the experiments with weighted and unweighted 
graphs separated. Different limits on the redundant work factor are shown in the 
first column, labeled /. In each row, the best strategies and their performance 
are shown for the ten graphs with D = 6, the ten graphs with D = 10, the ten 
graphs with D = 20, and, in the last major column, the thirty graphs combined. 
The best strategy was determined by averaging the minimum cache size data 
points for each strategy and selecting the strategy with the lowest average. In the 
case of more than one minimum, the strategy with the lowest average redundant 
work factor is shown. In a few cases two strategies of the form “se” and “sx” (or 
“sE” and “sX”) performed equally well. The minor columns give the averaged 
minimum cache size as a percentage of the state space size, the redundant work 
factor, and the strategy specifications selected as the best. Lastly, the value of 
to in the heading row is the maximum stack depth expressed as a percentage of 
the state space size; this is minimum possible size of the cache. 
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Table 1 . State caching results for random graphs 
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Weighted random graphs j 
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The first thing to note is that the table gives a very narrow view of the 
results, since there is no indication of how other strategies fared. For example, 
for unweighted graphs with D = 20 and / = 4, there are 14 other strategies 
that attain the 76.88% minimum cache size; they are not shown because their 
redundant work factor exceeded that of “ODIe/x”. In total, 32 strategies came in 
below 77%, 65 strategies below 80%, and 230 strategies below 85%. The disad- 
vantage of not showing all these results is, however, balanced by the consistency 
of the results, not to mention the problem of presenting such a volume of data. 

The random replacement and the stratified caching strategies did not fare 
well enough to appear in the table. For unweighted graphs the average minimum 
cache size for random replacement (“R”) is between 7.41% and 25.73% higher 
than that of the best strategy. Although stratified caching (specifically “eS”) 
fared consistently better, its figures are still between 4.11% and 16.18% higher. 

Instead, the “01. . . ” and “10. . . ” strategies dominate. The I attribute selects 
states with low indegree. Even if, as reported in [11], the number of past visits 
is not highly correlated with future visits, this approach works since, if many 
states are visited only once or twice, it is right more often than it is wrong. The 
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0 attribute selects states with low outdegree. If these states are revisited, there 
are fewer children that may have been replaced and need to be re-investigated. 

The D attribute also appears several times among the more highly connected 
graphs. This suggests that back edges are more likely to lead to closer levels, 
making it pay off to replace the shallowest states first. This may be due to the 
generation process: since there are more states at the deeper levels, deeper states 
have a higher probability of being selected for revisits. As to the occurrence of 
the e/x and E/X patterns, we believe that the disambiguating attribute (one 
of E/e/X/x/R) in all of the specifications is somewhat arbitrary, although the e 
attribute seems to have a slight edge. 

For the moment, all these observations are speculative, and warrant further 
investigation. It is, however, clear that a redundant work limit of / = 10 is too 
generous; even when / = 5 the best strategies came within 5% of the absolute 
minimum cache size. 

4.2 Experiments with Real Graphs 

In addition to random graphs, several Promela models were converted (after 
partial order reduction by the Spin tool) to graphs and explored as above, but 
with a redundant work factor limit of / = 5; the results are shown in Table 2. 
The columns show the model name, the number of states (in column states), 
the transition/state ratio (column d), the maximum stack depth as a percentage 
of S (column to), the minimum cache size as a percentage of S (column s), the 
redundant work factor (column rwf), and the number of best strategies and their 
specifications. 

The results fall into roughly three groups: for the first group (dining, erasthos- 
tenes, mobile2, schedule, and gobackn2) “OX” (i.e., least outdegree and then least 
recently removed from stack) strategies work best, for the second group (dbm, 
X509, mobilel, petersonN, rap, pftp, slide, and gobackn) “RS” (stratified caching 
with random selection of available states) strategies work best, and for the last 
group (the other graphs) the best strategies were more varied. At present we are 
unable to explain this grouping, but hope to investigate it further. 

When the performance (defined as s — to) of the strategies are averaged over 
the graphs and sorted, the stratified caching strategies occupy the top half of 
the list. In other words, the worst strategy with stratified caching outperforms 
the best strategy without, or to put it another way, of the replacement strategies 
previously considered in the literature, none achieved a better place than halfway 
down this list. At the top of the list is the “RS” (stratified caching with a doubling 
modulo in combination with random selection) strategy: its average minimum 
cache size is 5.41% higher than to, and its average redundant work factor is 3.12. 

5 A Spin Implementation 

To further investigate the viability of some state caching schemes, the Spin 
model checker was modified to include state caching. Three alternatives to out- 
of-the-box Spin were investigated. The first alternative uses a cache replacement 
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Table 2. State caching results for graphs based on actual models 



Model 


states 


d 


m 


s 


rwf 


Best strategies 


mutex 


428 


2.0 


15.42 


24.53 


3.16 


3 IeS ieS eS 


abp2 


1440 


1.3 


23.26 


23.40 


1.02 


5 iOe Oie Ole IOe Oe 


cabp 


2048 


4.1 


38.13 


47.41 


4.37 


14 OdleS OdlxS OdieS OdeS . . . 


dining 


2071 


3.6 


71.32 


74.02 


3.99 


5 OiXS I0XS 0IXS 0XS iOXS 


erathostenes 


2092 


1.2 


10.71 


12.24 


1.51 


5 iOX OiX 0IX I0X OX 


mobile2 


3300 


2.0 


23.36 


23.79 


4.94 


5 iOX OiX OIX IOX OX 


schedule 


3328 


1.3 


16.29 


16.77 


1.37 


5 iOX OiX OIX IOX OX 


dbm 


5111 


4.0 


1.96 


15.07 


4.58 


1 iRS 


X509 


6093 


2.0 


0.89 


13.52 


4.68 


1 RS 


snoopy 


9342 


1.4 


16.73 


17.00 


1.79 


1 OiR 


mobilel 


9970 


2.0 


7.16 


13.75 


3.86 


1 RS 


gobackn2 


14644 


1.8 


30.47 


31.28 


4.71 


5 iOX OiX OIX IOX OX 


peterson N 


16719 


1.8 


29.40 


31.17 


3.82 


1 RS 


rap 


26887 


7.9 


0.23 


18.16 


1.81 


1 RS 


pftp 


47355 


1.4 


3.18 


6.91 


3.39 


1 RS 


riaan 


67044 


1.1 


0.26 


1.27 


3.32 


1 OR 


slide 


89910 


1.5 


11.07 


12.91 


4.48 


1 ORS 


gobackn 


90209 


1.5 


11.03 


12.32 


4.65 


1 OldRS 



Table 3. Spin state caching results for a model of leader election in a general graph 



Memory 

limit 


Spin 4.1.0 


+ [7] 


+ OX j 


+ XS 


rwf\ 


time 


rwf\ 


time 


rwf | 


time 


rwf | time 


No limit 


1.00 


2.16 


— 


— 


— 


— 


- 


— 


100 


— 


— 


1.00 


2.29 


1.00 


2.26 


1.00 


2.26 


80 


- 


- 


1.00 


5.55 


1.06 


2.49 


1.00 


2.36 


60 


- 


- 


1.02 


17.82 


1.13 


2.71 


1.00 


2.78 


50 


— 


— 


1.04 


29.84 


1.16 


3.10 


1.00 


2.50 


40 


- 


- 


1.07 


47.21 


1.19 


2.91 


1.01 


2.55 


30 


- 




1.16 


76.09 


1.22 


3.01 


1.01 


2.64 


20 


— 




2.28 : 


231.97 


1.25 


3.09 


1.04 


2.70 


10 


- 




- 


- 


26.19 


63.89 


1.14 


3.03 



scheme identical to that of [7]. The second new version implements the OX strat- 
egy, identified as one of the “winners” in the previous section. It would have 
been instructive to investigate the RS strategy also, but true random selection 
is nontrivial to implement, especially in Spin where state cache entries do not 
have a uniform length and cannot be determined a priori. It would be possible 
to ask the user to specify the size of the state vector, and this may not be un- 
reasonable. However, to make the comparison as fair as possible, it was decided 
to implement the XS strategy as the third alternative Spin version. 
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Table 4. Performance of bitstate hashing with respect to partial/full exploration 



-w param. 


states | trans. \time 


22 


276112 350318 2.51 


23 


279982 354744 2.57 


24 


280880 355762 2.59 


25 


281155 356069 2.61 


26 


281207 356127 2.63 


27 


281261 356190 2.66 


28 


281263 356192 2.72 



Table 3 shows the results of running the three new versions on Spin on a 
single model of the echo algorithm with extinction for electing leaders in an 
arbitrary network, as described in [20, Chapter 7]. The memory limit is given in 
2 20 bytes, the redundant work factor is defined as before, and the time is given in 
seconds. The experiments were performed on a 2.6GHz Pentium 4 machine with 
512 megabytes of memory running Linux. These are only preliminary results 
and further experiments on more models are required to form a better idea of 
how well stratified caching behaves. From the figures in the table it is however 
clear that for this particular model this form of stratified caching outperforms 
the other two implementations. 

While the model may be quite small (it has only 281263 states and 356192 
transitions), it is interesting to look at the behaviour of bitstate hashing. Table 4 
shows the value of the -w parameter which specifies the number of bits to use 
for bitstate hashing, the number of states and transitions reported by Spin, and 
the time in seconds. Only when 2 28 bits = 32 megabytes are used, does the 
system investigate the full state space. This does not reflect on bitstate hashing 
in general: even when it does not explore the full state space, it can find true 
errors in models. However, the important point is that state caching techniques 
can extend the limit beyond which we are forced to resort to partial exploration 
of state spaces. 



6 Conclusions 

We have shown, for both random graphs and graphs based on actual models, 
that for a substantial but not unlimited increase in running time, the cache 
can often be reduced to close to the minimum size imposed by the maximum 
stack depth. The maximum increase in running time is close to tenfold for the 
experiments in Table 1, and almost fivefold in Table 2; for the XS strategy in 
Table 3 the running time is only at most 1.4 times more than normal. 

The results of the experiments are not as easy to interpret as we might have 
hoped. The strategies that worked best with random graphs are very different 
from those that worked best with actual models. For random graphs the results 
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were more consistent than for actual models, perhaps indicating that the gener- 
ated graphs represent only one “kind” of model. The experiments did however 
show that, by itself, random replacement is never the best strategy. Stratified 
caching in combination with random replacement emerged as the best strategy 
for almost half of the graphs based on actual models. In other cases, strategies 
based on the minimal outdegree and indegree of states proved successful. 

We believe that stratified caching can be improved even further by selecting 
available strata based on the average outdegree, average indegree, or simply the 
number of states in a stratum. Its performance in the experiments makes it a 
strong candidate for the replacement strategy, whenever state caching is used. 
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Abstract. Most approaches for model checking software are based on 
the generation of abstract models from source code, which may greatly re- 
duce the search space, but may also introduce errors that are not present 
in the actual program. 

In this paper, we propose a new model checker for the verification of na- 
tive c++-programs. To allow platform independent model checking of the 
object code for concurrent programs, we have extended an existing vir- 
tual machine for C++ to include multi-threading and different exploration 
algorithms on a dynamic state description. 

The error reporting capabilities and the lengths of counter-examples are 
improved by using heuristic estimator functions and state space com- 
paction techniques that additionally reduce the exploration efforts. 

The evaluation of four scalable simple example problems shows that our 
system StEAM 1 can successfully enhance the detection of deadlocks and 
assertion violations. 



1 Introduction 

Model checking [4] refers to exhaustive exploration of a system with the intention 
to prove that it satisfies one or more formal properties. After successful applica- 
tion in fields like hardware design, process engineering and protocol verification, 
some recent efforts [3,13,18,23] exploit model checking for the verification of 
actual programs written in e.g. Java or c. 

Most of these approaches rely on the extraction of a formal model from the 
source code of the program. Such a model can in turn be converted into the input 
language of an existing model checker (e.g. Spin [17]). The main advantage of 
abstract models is the reduction of state space. 

Some model checkers - e.g. clSpin [6] - also consider dynamic aspects of com- 
puter programs, like memory allocation and dynamic object creation. These 
aspects must be mapped to the respective description language. If this is not 

1 State Exploring Assembly Model Checker 
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carefully addressed, the model may not correctly reflect all aspects of the actual 
program. Moreover, some of these approaches require manual intervention of 
the user which means that one has to be familiar with the input language of the 
model checker. 

A different approach is considered in the Java model checker JPF. Instead 
of extracting a model from the source code, JPF [24] uses a custom-made Java 
virtual machine to check a program on the byte-code level. This eliminates the 
problem of an inadequate model of the program - provided that the virtual ma- 
chine works correctly. The developers of JPF choose Java as the target program- 
ming language for several reasons [25]: First, Java features object-orientation 
and multi-threading in one language. Second, Java is simple. And third, Java is 
compiled to byte-code and hence the analysis can be done on byte-code level (as 
opposed to a platform-specific machine code). Also it was decided to keep JPF 
as modular and understandable to others as possible, sacrificing speed. 

StEAM , the model checker presented in this paper, addresses a more general 
approach to such low-level program model checking. Based on a virtual proces- 
sor, called the Internet Virtual Machine (IVM), the tool performs a search on 
machine-code compiled from a C++ source. On one hand, this provides the op- 
tion to model-check programs written in the industrial standard programming 
language, while, on the other hand, the generic approach is extendible to any 
compiler-based programming language with reasonable effort. Our method of 
reduction keeps state spaces small to compete with memory efficiency of other 
model checkers that apply model abstraction. 

The architecture of StEAM is inspired by JPF. However the developers faced 
some additional challenges. First, there is no support for multi threading in 
standard C++. The language as well as the virtual machine had to be extended. 
Second, since the virtual machine is written in plain c, so is StEAM. Although 
this increases development time, we believe that the model checker will in the 
long term benefit from the increased speed of c compared to Java. Memory- 
efficiency is one of the most important issues of program model checking. There 
are various options for a time-space trade-off to save memory and explore larger 
state spaces. However, such techniques require that the underlying tool is fast. 
Moreover StEAM successfully ties the model checking algorithm with an existing 
virtual machine. A task thought impossible by the developers of JPF [25]. 

The paper is structured as follows. First, it introduces the architecture of 
the system. Next, it shows which extensions were necessary to enable program 
model checking, namely the storage of system states, the introduction of non- 
determinism through threads, and different exploration algorithms to traverse 
the state space in order to validate the design or to report errors. We illustrate 
the approach with a small example. The complex system state representation 
in StEAM is studied in detail. We introduce an apparent option for state space 
reduction and explain, why heuristics estimates accelerate the detection of errors 
and the quality of counter-examples. Experiments show that StEAM effectively 
applies model checking of concurrent c++-programs. Finally we relate StEAM to 
other work in model checking, and conclude. 
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2 Architecture of the Internet C Virtual Machine 

The Internet C Virtual Machine (ICVM) by Bob Daley aims at creating a pro- 
gramming language that provides platform-independence without the need of 
rewriting applications into proprietary languages like cjj or Java. The main pur- 
pose of the project was to be able to receive precompiled programs through the 
Internet and run them on an arbitrary platform without re-compilation. Fur- 
thermore, the virtual machine was designed to run games, so simulation speed 
was crucial. 

The Virtual Machine. The virtual machine simulates a 32-bit CISC CPU with a 
set of approximately 64,000 instructions. The current version is already capable 
of running complex programs at descend speed, including the commercial game 
Doom 2 . This is a strong empirical evidence that the virtual machine works cor- 
rectly. Thus, dynamic aspects are carefully addressed. IVM is publicly available 
as open source 3 . 

The Compiler. The compiler takes conventional c/c++ code and translates it 
into the machine code of the virtual machine. ICVM uses a modified version of 
the GNU C-compiler gcc to compile its programs. The compiled code is stored in 
ELF (Executable and Linking format), the common object file format for Linux 
binaries. The three types of file representable are object files, shared libraries 
and executables, but we will consider mostly executables. 

ELF Binaries. An ELF-binary is partitioned in sections describing different 
aspects of the object’s properties. The number of sections varies depending on 
the respective file. Important are the DATA and BSS sections. Together, the two 
sections represent the set of global variables of the program. 

The BSS section describes the set of non-initialized variables while the DATA 
section represents the set of variables that have an initial value assigned to them. 
When the program is executed, the system first loads the ELF file into memory. 
For the BSS section additional memory must be allocated, since non-initialized 
variables do not occupy space in the ELF file. 

Space for initialized variables, however, is reserved in the DATA section of 
the object file, so accesses to variables directly affect the memory image of the 
ELF binary. Other sections represent executable code, symbol table etc., not to 
be considered for memorizing the state description. 

3 Multi-threading 

In the course of our project, the virtual machine was extended with multi- 
threading capabilities, a description of the search space of a program, as well 
as some special-purpose program statements which enable the user to describe 

2 www. doomworld. com/ classicdoom/ports 

3 ivm.sourceforge.net 
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| | =physical memory | | = VM-memory | | =program memory 



Fig. 1 . System state of a StEAM program. 



and guide the search. Figure 1 shows the components that form the state of a 
concurrent program for StEAM. 

System Memory Hierarchy. Memory is organized in three layers: Out-most is 
the physical memory which is only visible to the model checker. The subset VM- 
memory is also visible to the virtual machine and contains information about 
the main thread, i.e., the thread containing the main method of the program to 
check. The program memory forms a subset of the VM-memory and contains 
regions that are dynamically allocated by the program. 

Stacks and Machines. For n threads, we have stacks si, . . . ,s n and machines 
mi,... ,m n , where si and mi correspond to the main thread that is created 
when the verification process starts. Therefore, they reside in VM-memory. The 
machines contain the hardware registers of the virtual machine, such as the 
program counter (PC) and the stack and frame pointers (SP, FP). Before the 
next step of a thread can be executed, the content of machine registers and stack 
must refer to the state immediately after the last execution of the same thread, 
or, if it is new, directly after initialization. 

Dynamic Process Creation. From the running threads, new threads can be cre- 
ated dynamically. Such a creation is recognized by StEAM through a specific 
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pattern of machine instructions. Program counters (PCs) indicate the byte off- 
set of the next machine instruction to be executed by the respective thread, i.e., 
they point to some position within the code-section of the object file’s memory 
image (MI). MI also contains the information about the DATA and BSS sections. 
Note that (in contrast to the DATA section) the space for storing the contents 
of the variables declared in the BSS section lies outside the MI and is allocated 
separately. 

Memory- and Lock-Pool. The memory-pool is used by StEAM to manage dynam- 
ically allocated memory. It consists of an AVL-tree of entries (memory nodes), 
one for each memory region. They contain a pointer to address space which is 
also the search key, as well as some additional information such as the identity 
of the thread, from which it was allocated. 

The lock-pool stores information about locked resources. Again an AVL-tree 
stores lock information. 

4 Exploration 

There is a core difference between the execution of a multi-threaded program and 
the exploration of its state space. In the first case, it suffices to restore machine 
registers and stack content of the executed thread. 

To explore a program state space, the model checker must restore the state 
of DATA and BSS, as well as the memory and lock pool. Although StEAM does 
support program simulation, we consider only exploration. 

Special Command Patterns. On the programming level, multi-tlrreading is re- 
alized through a base class ICVMThread, from which all thread classes must be 
derived. A class derived from ICVMThread must implement the methods start, 
run and die. After creating an instance of the derived thread-class, a call to 
start will initiate the thread execution. 

The ran-metlrod is called from the start-method and must contain the ac- 
tual thread code. New commands e.g. VLDCK and VUNLDCK for locking have been 
integrated using macros. The compiler translates them to usual c++-code which 
does not influence the user-defined program variables. During program verifica- 
tion code patterns are detected in the virtual machine where special commands, 
like locking, are executed. This way of integration avoids manipulation of the 
compiler. 

Example. Figure 1 shows a simple program glob which generates two threads 
from a derived thread class My Thread, that access a shared variable glob. 

The main program calls an atomic block of code to create the threads. 
Such a block is defined by a pair of BEGINATOMIC and ENDATOMIC statements. 
Upon creation, each thread is assigned a unique identifier ID by the construc- 
tor of the super class. An instance of My Thread uses ID to apply the statement 
glob=(glob+l) *ID. 
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Table 1. The source of the program glob. 



01. #include "IVMThread.h" 

02. #include "My Thread. h" 

04. extern int glob; 

05. 

06. class IVMThread; 

07. MyThread: : My Thread () 

08. : IVMThread : : IVMThreadO { 

09. } 

10. void MyThread: : start () { 

11 . runQ ; 

12. die(); 

13. } 

14. 

15. void MyThread: : run () { 

16. glob=(glob+l)*ID; 

17. } 

18. 

19. void MyThread: : die () { 

20 . } 

21 . 

22. int MyThread: : id_counter ; 



01. #include <assert.h> 

02. #include "My Thread. h" 

03. #def ine N 2 

04. 

05. class MyThread; 

06 . MyThread * t [N] ; 

07. int i,glob=0; 

08. 

09. void initThreads () { 

10. BEGINAT0MIC 

11. f or (i=0 ; i<N; i++) { 

12. t[i]=new MyThreadO ; 

13. t [i] ->start () ; 

14. > 

15 . END ATOMIC 

16. } 

17. 

18. void mainQ { 

19. initThreads () ; 

20. VASSERT (glob ! =8) ; 

21 . } 



The main method contains a VASSERT statement. This statement takes a 
boolean expression as its parameter and acts like an assertion in established 
model checkers like e.g. SPIN [17]. If StEAM finds a sequence of program in- 
structions (the trail) which leads to the line of the VASSERT statement, and the 
corresponding system state violates the boolean expression, the model checker 
prints the trail and terminates. 

In the example, we check the program against the expression glob ! =8. Fig- 
ure 2 shows the error trail of StEAM , when applied to glob. Thread 1 denotes 
the main thread, Thread 2 and Thread 3 are two instances of MyThread. The 
returned error trail is easy to trace. First, instances of MyThread are generated 
and started in one atomic step. Then the one-line run-method of Thread 3 is 
executed, followed by the run-metlrod of Thread 2. We can easily calculate why 
the assertion is violated. After Step 3, we have glob=(0+l)*3=3 and after step 5 
we have glob=(3+l)*2=8. After this, the line containing the VASSERT-statement 
is reached. 

The assertion is only violated, if the main method of Thread 3 is executed 
before the one of Thread 2. Otherwise, glob would take the values 0, 2, and 9. 
By default, StEAM uses depth first search (DFS) for a program exploration. In 
general, DFS finds an error quickly while having low memory requirements. As 
a drawback, an error trails found with DFS can become very long, in some cases 
even too long to be traceable by the user. In the current version, StEAM supports 
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Fig. 2. The error-trail for the ’glob’-program. 

DFS, breadth-first search (BFS) and the heuristic search methods best-first (BF) 
and A* (see e.g. [9]). 

Detecting Deadlocks. StEAM automatically checks for deadlocks during a pro- 
gram exploration. A thread can gain and release exclusive access to a resource 
using the statements VLDCK and VUNLOCK which take as their parameter a pointer 
to a base type or structure. When a thread attempts to lock an already locked 
resource, it must wait until the lock is released. A deadlock describes a state 
where all running threads wait for a lock to be released. A detailed example is 
given in [21]. 

Hashing. StEAM uses a hash table to store already visited states. When ex- 
panding a state, only those successor states not in the hash table are added to 
the search tree. If the expansion of a state S yields no new states, then S forms 
a leaf in the search tree. To improve memory efficiency, we fully store only those 
components of a state which differ from that of the predecessor state. If a tran- 
sition leaves a certain component unchanged - which is often the case for e.g. 
the lock pool - only the reference to that component is copied to the new state. 
This has proven to significantly reduce the memory requirements of a model 
checking run. The method is similar to the Collapse Mode used in Spin [16]. 
However, instead of component indices, StEAM directly stores the pointers to 
the structures describing respective state components. Also, only components of 
the immediate predecessor state are compared to those of the successor state. A 
redundant storing of two identical components is therefore possible. Additional 
savings may be gained through reduction techniques like heap symmetry [19], 
which are subject to further development of StEAM. 

5 Accelerating Error Detection 

Our approaches to accelerate error detection are twofold. First, we reduce 
the state space of assembly-level program state exploration. Second, we invent 
heuristics, which accelerate error detection, especially the search for deadlocks. 

Although StEAM can model check real C++ programs, it is limited in the 
size of problems it can handle. Unmodified C++ programs have more instructions 
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than abstract models other model checker take as their input. The state space 
usually grows exponential in the number of threads including the number of 
executed machine instruction in each thread as a factor. For each instruction, 
all permutations of thread orders can occur and have to be explored. 

Lock and Global Compaction. The occurrence of an error does not depend on 
every execution order. An exploration of a single thread has to be interrupted 
only after lock/unlock or access to shared variables. Each access to local memory 
cells cannot influence the behaviour of other threads. Therefore, we realized two 
kinds of state reduction techniques. 

The first one, called lock and global compaction, Igc for short, executes each 
thread until the next access to shared memory cells. Technically, this is performed 
by looking at the memory regions, that each assembly level instruction accesses 
and at lock instructions. 

Source Line Compaction. The second kind of exploration, nolgc for short, re- 
quires each source line to be atomic. No thread switch is allowed during execution 
of a single source line. This is not immediate, since each line of code correspond 
to a sequence of object code instruction. 

The implication for the programmer is that infinite loops that e.g wait for 
change of a shared variable, are not allowed in a single source line. We expect 
the body of the loop to be unfolded in forthcoming source. The source line 
compaction is only sound with respect to deadlock detection, if read and write 
access as well as lock and unlock access to the same variable are not included in 
one line. 

Both techniques reduce thread interleaving and link to the automated process 
of partial order reduction, which has been implemented in many explicit state 
model checking systems [20,7]. The difference is that we decide, whether or 
not a thread interleaving has to be considered by looking at current assembler 
instruction and the lock pool. 

6 Directed Program Model Checking 

Heuristics have been successfully used to improve error detection in concurrent 
programs, see e.g. [12,8]. States are evaluated by an estimator function, measur- 
ing the distance to an error state, so that states closer to the faulty behavior 
have a higher priority and are considered earlier in the exploration process. If 
the system contains no error, there is no gain, the whole search space is enumer- 
ated. Compared to blind search, the only loss is due to additional computational 
resources for the heuristics. 

Most-Block and Interleaving. An appropriate example for the detection of dead- 
locks is the most-block heuristic. It favors states, where more threads are blocked. 
Another established estimate used for error detection in concurrent programs is 
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the interleaving heuristic. It relies on maximizing the interleaving of thread ex- 
ecutions [12]. 

In the following we consider new aspects to improve the design of estima- 
tor functions. In StEAM heuristics either realize single ideas or mixed ones, so 
we first start with three basic primitives: lock, shared variables and thread-id, 
followed by a treatment on how to combine them to higher-order functions. 

Lock. In the lock heuristic, we prefer states with more variables locks and more 
threads alive. Locks are obvious preconditions for threads to become blocked. 
Only threads that are still alive can get blocked in the future. 

Shared Variable. Finally, we considered the access to shared variables. We prefer 
a change of the active thread after a global read or write access. The objective 
is that after accessing a global variable, other threads are likely to be affected. 

Thread-Id. Threads are of equal class in many cases. If threads have equal pro- 
gram code and differ only in their thread-id, their internal behaviour is only 
slightly different. If the threads are ordered linear ascending according to their 
id, we may prefer the last one. 

The thread-id heuristics can be seen as kind of a symmetry reduction rule, 
because we impose a preference ordering on similar threads, which sets a penalty 
to the generation of equivalent state generation. Symmetric reduction based on 
ordered thread-IDs and their PC values is e.g. analyzed in [2] and integrated into 
dSpin. One advantage compared to other approaches in symmetry reduction is, 
that we encoded the similarity measure into the estimator function. Therefore, 
the approach is more flexible and can be combined easily with other heuris- 
tics. Moreover, no explicit computation of canonical states takes place and the 
approach is not specialized to a certain problem domain. 

Favoring Patterns. Each thread has internal values and properties that we can 
use to define which thread execution we favor in an exploration step, e.g. thread- 
id (ID), PC, number of locked variables, number of executed instructions so 
far, and flags like blocked and alive. If blocked is set, the thread is waiting for 
a variable to be unlocked. If alive is unset, the thread will not be executed 
anymore: it is dead. 

We select a subset of all possible components to define heuristics. The pattern 
for some relevant components for an example system state is shown in Figure 3. 
We call those patterns favoring , as they favor states in the exploration. In the 
following we explain one of the heuristics in use, namely the pbb heuristic. 

If we add simply the number of all blocked threads in an heuristic, we obtain 
the mentioned most blocked heuristic. But many states have the same number of 
blocked threads. The number of equal states, that can be obtained from a given 
state by permuting thread-ids and reordering the threads can be exponential in 
the number of threads. 

To favor only a few of equivalent states, one concept of favoring patterns 
we use are neighbor groups , maximal groups of consecutive threads having a 
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Fig. 3. Favoring patterns. 



certain property, in our example flag blocked. Additionally, we prefer systems 
states, where the neighbor groups are rightmost. We abbreviate neighbor group 
by group. 

For a combined estimate for the entire state, we first square the size of each 
group. To express the preference for rightmost group, we multiply the obtained 
value with the largest thread-id in the group. Then the values for each group 
are added. In the example of Figure 3, we have value 12 for the first group and 
value 72 for the second group yielding the sum 94. 

To prefer states with more threads blocked, we additionally add the cubed 
number of the total of all blocked threads. For the maximum possible number of 
blocked threads this value is compatible with the previous sum. In our example 
five threads are blocked, so that the final preference value is 94+ 625 = 721. 

Table 2 summarizes the applied heuristics together with a brief description. 
Newly contributed heuristic start with pba , known ones refer to [12]. 



7 Experimental Results 

An evaluation of StEAM has been performed on a Linux based PC (AMD Athlon 
XP 2200+ processor, 1 GB RAM, 1,800 MHz clock speed). The memory has been 
limited to 900 MB and the search time to 20 minutes. 

Models. We conducted experiments with four scalable programs, e.g. simple 
communication protocols. Even though the selection is small and contains not 
very elaborated case studies, the state space complexity of the programs can 
compare with the code fragments that are often considered for program model 
checking. 

The first model is the implementation of a deadlock solution to the dining 
philosophers problem (philo) as described in [10,21], 

The second model implements an algorithm for the leader election protocol 
(leader). Here an error was seeded, which can cause more than one process to 
be elected as the leader. Both of the above models are scalable to an arbitrary 
number of processes. 







Directed Error Detection in C++ 



49 



Table 2. Known ( mb,int ) and newly introduced heuristics. 



Heuristic 


Abb. 


Description 


most blocked 


mb 


Counts the number of blocked threads 


interleaving 


int 


The history of executed thread numbers is considered to pre- 
fer least recently executed threads 


preferred alive & 
blocked 


pba 


Adds the cubed number of all blocked threads, the squared 
number of all alive threads and for each group of alive threads 
the squared number of threads weighted with the rightmost 
thread- id. 


preferred blocked 


pbb 


Add the cubed number of all blocked threads, the squared 
number of all blocked threads and for each group of blocked 
threads the squared number of threads weighted with the 
rightmost thread-id. 


read & write 


rw 


Prefers write access to shared variables and punishes read 
accesses without intermediate write access on each thread. 


lock n’ block 


lnb 


Count the number of locks in all threads and the number of 
blocked threads. 


preferred locked 1 


pll 


Counts the squared number of locks in all threads and the 
squared number of blocked threads. 


preferred locked 2 


pl2 


Counts the squared number of locks in all threads and the 
squared number of groups of blocked and alive threads, where 
rightmost groups are preferred by weighting each group with 
the largest thread-id. 


alternating read & 
write 


aa 


Prefers alternating read and write access. 



The third model, is a C++ implementation of the optical telegraph protocol 
( opttel ), which is described in [15]. The model is scalable in the number of 
telegraph stations and contains a deadlock. 

The fourth model is an implementation of a bank automata scenario ( cashit ). 
Several bank automata perform transaction on a global database (withdraw, 
request, transfer). The model is scalable in the number of automata. A wrong 
lock causes an access violation. 



Undirected Search. The undirected search algorithms considered are BFS and 
DFS. Table 3 shows the results. For the sake of brevity we only show the result 
for the maximum scale (s) that could be applied to a model with a specific 
combination of a search algorithm with or without state space compaction (c), 
for which an error could be found. We measure trail length (/), the search time 
(t in 0.1s) and the required memory (in in KByte). In the following n denotes 
the scale factor of a model and msc the maximal scale. 

In all models BFS already fails for small instances. For example in the dining 
philosophers, BFS can only find a deadlock up to n = 6, while heuristic search 
can go up to n = 190 with only insignificant longer trails. DFS only has an 
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Table 3. Results with Undirected Search. 
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advantage in leader, since higher instances can be handled, but the produced 
trails are longer than those of heuristic search. 

Heuristic Search. Table 4 and Table 5 depict the obtained results for directed 
program model checking. 

The int heuristic, used with BF, shows some advantages compared to the 
undirected search methods, but is clearly outperformed by the heuristics which 
are specifically tailored to deadlocks and assertion violations. The rw heuristic 
performs very strong in both, cashit and leader, since the error in both protocols 
is based on process communication. In fact rw produces shorter error trails than 
any other method (except BFS). 

In contrast to our expectation, aa performs poorly at the model leader and 
is even outperformed by BFS with respect to scale and trail length. For cashit, 
however, aa, leading to the shortest trail and rw are the only heuristic that find 
an error for n = 2, but only if the Igc reduction is turned off. Both of these 
phenomena of aa are subject to further investigation. 

The lock heuristics are especially good in opttel and philo. With BF and Igc 
they can be used to a msc of 190 (philo) and 60 (opttel). They outperform other 
heuristics with nolgc and the combination of A* and Igc. In case of A* and nolgc 
the results are also good for philo (n=6 and n=5; msc is 7). 

According to the experimental results, the sum of locked variables in contin- 
uous block of threads with locked variable is a good heuristic measure to find 
deadlocks. Only the Inb heuristic can compare with pll and pl2 leading to a sim- 
ilar trail length. In case of cashit pl2 and rw outperform most heuristics with 
BF and A*: with Igc they obtain an error trail of equal length, but rw needs 
less time. In the case of A* and nolgc pl2 is the only heuristic which leads to a 
result for cashit. In most cases both pi heuristics are among the fastest. 

The heuristic pba is in general moderate but sometimes, e.g. with A* and 
nolgc and opttel (msc of 2) and philo (msc of 7) outperforming. The heuristic 
pbb is often among the best, e.g. leader with BF and Igc (n = 6, msc = 80), 
philo with BF and Igc (n=150; msc is 190) and philo with A* and nolgc (n = 6, 
msc = 7). Overall the heuristics pba and pbb are better suited to A*. 

Experimental Summary. Figure 4 summarizes our results. We measure the per- 
formance by extracting the fourth root of the product of trail length, processed 
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Table 4. Results with Best-First Search. 
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states, time and memory (geometric mean of the arguments) . We use a log-scale 
on both axes. 

In the case of the leader election protocol, the advantage of compaction is 
apparent. The maximum possible scale almost doubles, if the compaction is used. 
The graphic shows that rw and DFS perform similar, finally DFS is a little bit 
better with lgc, but rw is much better with nolgc. BFS, int, and aa behave 
similar in both cases. 

In opttel, the heuristics Inb , pll and pl2 are performing best. The pl2 heuristic 
only starts to perform well with n = 15, before the curve has a high peak. It 
seems, that the preference of continuous blocks of alive or blocked thread has 
only a value, after increasing a certain scale, here 10. The pab and pbb heuristic 
perform similar up to an msc of 9. 

In philo, the heuristics pll , pl2, pba , pbb are performing best. If only BF is 
considered, the heuristic Inb behaves similar than pll and pl2. Again, pl2 has 
an initial peak. DFS is performing well to msc of 90. 

In the experiments the new heuristics show an improvement in many cases. 
In the case of deadlock search the new lock and block heuristics are superior 
to most blocked. The lgc compaction often more than doubles the maximum 
possible model scale. 
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Table 5. Results for A*. 
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8 Related Work 

We discuss other projects, that deal with model checking or software testing. 

CMC [22], the c Model Checker, checks c and C++ implementations directly 
by generating the state space of the analyzed system during execution. CMC 
has mainly been used to check correctness properties of network protocols The 
checked correctness properties are assertion violations, a global invariant check 
avoiding routing loops, sanity checks on table entries and messages and memory 
errors. CMC is specialized to event based systems supporting process commu- 
nication through shared memory. The successor states are generated by calling 
all possible event handlers from the given state. CMC is capable to detect an 
error between two events and to state the kind of violation. The only witness of 
an error is the sequence of processed events. A sequence of events does not lead 
straight forward to the executed source lines, while a StEAM error trail states 
every executed source line. To control the behaviour of the dynamic memory 
allocation, malloc is overloaded, such that processes with the same sequence of 
calls to malloc have the same memory map for all allocated variables. StEAM 
identifies memory allocation and access by directly interpreting the machine 
code. 

Verisoft [11] uses the same approach as CMC. A scheduler emulates the process 
environment and calls event handlers to generate all possible sequences of events. 
In contrast to CMC, Verisoft does not store all processed states in a hash table. 
It combines persistent sets and sleep sets, that refer to a notion of independency 
of transitions to restrict search. This is advantageously, if the search graph is 
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Fig. 4. Performance for philo/lgc, opttel/lgc, leader/nolgc, leader/lgc. x-axes denote 
scale, y-axes denote the geometric mean of trail length, states, time and memory. 



finite and acyclic. To avoid infinite search for arbitrary search graphs, the search 
is depth-bounded. This approach minimizes memory usage, but may lead to 
repeated computation of identical states that are not identified as such. The 
limitation of the search-depth can miss the detection of errors. 



sC++ and AX. In [3], semantics are described to translate sC++ source code 
into Promela - the input language of the model checker Spin [17]. The language 
sC++ is an extension of C++ with concurrency. Many simplifying assumptions 
are made for the modeling process: only basic types are considered, no structures, 
no type definitions, no pointers. A similar approach is made by the tool AX 
(Automaton extractor) [18]. Here, Promela models can be extracted from c 
source code at a user defined level of abstraction. 
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BLAST [14], the Berkeley Abstraction Software Toolkit is a model checker for c 
programs, which is based on the property-driven construction and model check- 
ing of software abstractions. The tools takes as its input a c program and a 
safety monitor written in c. The verification process is based upon counter- 
example driven refinement: Starting from an abstract model of the program as a 
pushdown automaton, the tool checks, if the model fulfills the desired property. 
If an error state is found, BLAST automatically checks, if the abstract coun- 
terexample corresponds to a concrete counterexample in the actual program. If 
this is not the case, an additional set of predicates is chosen to build a more 
concrete model and the property is checked anew. 

SLAM [1] also uses counter-example driven refinement. Here a boolean abstrac- 
tion of the program is constructed. Then a reachability analysis is performed on 
the boolean program. Afterwards, additional predicates are discovered to define 
the boolean program - if necessary. 

Bandera [5] constitutes a multi-functional tool for Java program verification. 
Bandera is capable of extracting models from Java source code and converting 
them to the input language of several well known model checkers such as Spin 
or SMV. As one very important option to state space reduction, Bandera allows 
to slice source code. 

Bogor [23] is a model checking framework with an extendible input language 
for defining domain-specific state space encodings, reductions and search algo- 
rithms. It allows domain experts to build a model checker optimized for their 
specific domain without in-depth knowledge about the implementation of a spe- 
cific model checker. The targeted domains include code, designs and abstractions 
of software layers. Bogor checks systems specified in a revised version of the BIR 
format, which is also used in Bandera [5,13]. 

9 Conclusion 

This paper introduces StEAM , an assembly-level C++ model checker. We give 
insight into the structure and working of our tool. The purpose of the tool is to 
show that the verification of actual C++ programs is possible without generating 
an abstract model of the source code. 

Our approach of directed program model checking outperforms undirected 
search with respect to trail length and maximum scale. We further extended the 
set of heuristics in model checking that are specifically tailored to find deadlocks 
and assertion violations. Many of the newly invented heuristics perform better 
than known heuristics. One option are favoring pattern , heuristics that relate to 
symmetry reduction. Another contribution are Igcs, that relate to partial order 
reduction. Both approaches encode pruning options in form of state preference 
rules and significantly improves the performance with respect to most search 
methods. 
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StEAM currently supports assertions for the definition of properties. This al- 
ready allows testing for safety properties and invariants, which may be sufficient 
in many cases. However, for the verification of more complex properties, support 
for temporal logics like LTL is desirable. Subsequent work will focus on how 
the functionality of such logics can be implemented in a straightforward man- 
ner, that is accessible for practitioners. Also, we will add automatic detection of 
illegal memory access. 

In the future we will also consider new models, heuristics and search methods, 
e.g. we will integrate variations of DFS such as iterative deepening and related 
search methods to reduce the memory requirements. 
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Abstract. Bitstate hashing in SPIN has proved invaluable in probabilistically 
detecting errors in large models, but in many cases, the number of omitted states 
is much higher than it would be if SPIN allowed more than two hash functions to be 
used. For example, adding just one more hash function can reduce the probability 
of omitting states at all from 99% to under 3%. Because hash computation accounts 
for an overwhelming portion of the total execution cost of bitstate verification with 
SPIN, adding additional independent hash functions would slow down the process 
tremendously. We present efficient ways of computing multiple hash values that, 
despite sacrificing independence, give virtually the same accuracy and even yield 
a speed improvement in the two hash function case when compared to the current 
SPIN implementation. 

Another key to accurate bitstate hashing is utilizing as much memory as is avail- 
able. The current SPIN implementation is limited to only 5 12MB and allows only 
power-of-two granularity (256MB, 128MB, etc). However, using 768MB instead 
of 5 1 2MB could reduce the probability of a single omission from 20% to less than 
one chance in 10,000, which demonstrates the magnitude of both the maximum 
and the granularity limitation. We have modified SPIN to utilize any addressable 
amount of memory and use any number of efficiently-computed hash functions, 
and we present empirical results from extensive experimentation comparing var- 
ious configurations of our modified version to the original SPIN. 



1 Introduction 

“Bitstate verification” [10] is a term that has been used by the model checking commu- 
nity to refer to explicit-state model checking with Bloom filters. Explicit-state model 
checkers, such as Holzmann’s SPIN, have been used with great success in a variety of do- 
mains, including verification of finite-state, concurrent systems, such as cache coherence 
and network protocols. 

Much of the research in model checking is focused on tackling the state explosion 
problem', a linear increase in the number of components leads to an exponential increase 
in the size of the resulting models. State explosion is a particularly acute problem in the 
context of explicit model checking, as memory requirements depend linearly on the size 
of the state space. Because the most efficient explicit-state techniques call for storing the 
set of all visited states in core memory, making the representation of the visited set more 
compact means larger models can be explored more quickly. By resorting to probabilistic 
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methods of storing the visited set, the representation can be made exceptionally compact, 
enabling much larger state spaces to be tackled efficiently. The drawback of probabilistic 
methods, of course, is that there is a possibility of omitting states with errors. 

The Bloom filter is a popular choice of data structure for compactly storing sets [2], 
The main parameter for tuning a Bloom filter is the number of hash functions used, 
and the bitstate mode of SPIN utilizes a Bloom filter with 2 hash functions. In [17], 
Wolper and Leroy promote using 20 hash functions instead of Holzmann’s choice of just 
2, for in many cases, SPIN would be more accurate if more hash functions were used. 
However, Holzmann notes that the choice of 2 “was adopted in SPIN as a compromise 
between runtime expense and coverage,’’ and explains why using more hash functions 
is impractical [11]: 

In a well-tuned model checker, the run-time requirements of the search depend 
linearly on k[, the number of hash functions used]: computing hash values is 
the single most expensive operation that the model checker must perform. The 
larger the value of k, therefore, the longer the search for errors will take. In the 
model checker SPIN, for instance, a run with k = 90 would take approximately 
45 times longer than a run with k = 2. 

We have discovered a Bloom filter enhancement that gives virtually the same effect 
as using more independent hash functions, but at a fraction of the runtime cost. For 
example, this technique alone can produce the effect of 20 hash functions with only 2.3 
times the cost of using 2 hash functions — far from Holzmann’s factor of 10. In the process 
of incorporating our technique into SPIN, we discovered other ways of improving the 
speed and accuracy of bitstate verification in SPIN. More specifically, we show that 
making more intelligent use of the Jenkins hash function [13] can significantly speed 
up verification. We tackle issues associated with accommodating an arbitrary amount 
of memory, and show how this simple issue can easily make orders of magnitude of 
difference in the possibility of incomplete coverage. 

This paper is oriented toward describing and evaluating implementation considera- 
tions we made when implementing our modified version of SPIN. The analysis of our 
techniques in this paper is mostly experimental; a more formal, mathematical analysis 
of our techniques will appear elsewhere. We refer to our system as “Triple SPIN,” or 
3SPIN, which is available on the Web for download [5], 

Many experimental results are presented throughout. All timings were taken on a 
2.53Ghz Pentium 4 with 512MB of RDRAM running Red Hat Linux 7.3. We used 
version 3.1.1 of the GNU C compiler with third-level general optimizations and all 
Pentium 4-specific optimizations enabled. 

To combat the state explosion problem, in addition to hashing — the main topic of this 
paper — explicit state model checkers use techniques such as partial order reductions [6, 
8] and symmetry reductions [3], The improvements to bitstate verification discussed in 
this paper do not affect its compatibility with these techniques, but we have disabled 
reductions in all of our tests in order to easily measure accuracy. 

This paper is organized as follows. In Section 2 we give an overview of Bloom 
filters and show that they are quite sensitive to the number of hash functions used, 
e.g ., the expected number of omissions when using two hash functions can be several 
orders of magnitude greater than the number of expected omissions when using the 
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optimal number of hash functions. Bloom filters employing more than two hash functions 
were thought to be impractical because of the running time overhead they incur, but in 
Sections 3 and 4 we present new techniques to address this issue. In Section 5 we address 
the memory limitations imposed by the current implementation of bitstate verification 
in SPIN, version 4.0.7. We wrap up with experimental results incorporating all the 
techniques from the paper in Section 6, and give conclusions and future directions for 
the research in Section 7. 

2 Bloom Filters in Verification 

In this section we overview Bloom filters and consider in more detail the trade-offs 
involved in a Bloom filter and how these apply in the realm of verification. We also 
present some analysis that sets up a framework for evaluating our results. 

For the basics, we turn to Bloom himself [1]: 

[A Bloom filter] completely gets away from the conventional concept of or- 
ganizing the hash area into cells. The hash area is considered as N individual 
addressable bits, with addresses 0 through N — 1. It is assumed that all bits in 
the hash area are first set to 0. Next, each message in the set to be stored is hash 
coded into a number of distinct bit addresses, say ai, « 2 , . . . , a,i- Finally, all d 
bits addressed by aq through a<j are set to 1 . 

To test a new message a sequence of d bit addresses, say a! x , a' 2 , . . . , a' d , is 
generated in the same manner as for storing a message. If all d bits are 1, the 
new message is accepted. If any of these bits is zero, the message is rejected. 

In this paper, we refer to the d functions that produce the indices into the bit vector 
as the “index functions” and use m, k. and n to represent the size, in bits, of the Bloom 
filter, the number of index functions used, and the number of objects added to the Bloom 
filter, respectively. 

Although Bloom filters can be very compact, the downside is that when a membership 
query indicates that an element is in the Bloom filter, there is a certain probability of an 
error — that is, of a false positive. If we assume that the index functions are independent 
and uniform, then the probability that an index function does not select a specific bit is 
p = 1 — — . After inserting i elements into the Bloom filter, the probability that a specific 

• ^ b ' 

bit is still 0 is p. Therefore, the probability of a false positive, after i elements have 
been added to the Bloom filter, is (l — p k ‘) . 

While the false positive rate is the primary metric for evaluation and optimization 
in many applications of Bloom filters [2], the way we use Bloom filters in verification 
gives rise to two more meaningful metrics: the expected number of omissions and the 
probability of having no omissions. 

We compute the expected number of omissions when attempting to add n distinct 
states by adding the probability of a false positive when i states have already been added 
to the Bloom filter, as i ranges from 0 to n — 1. 

n— 1 
i = 0 
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To compute the probability of no omissions at all, we start by noting that in a Bloom 
filter containing i elements, the probability that adding a new element does not lead 
to an omission is just 1 minus the probability of a false positive, 1 — (l — p kl ) The 
probability of there not being an omission at all is just the product of there not being an 
omission as i ranges from 0 to n — 1: 

n— 1 

n (i - (i -»“>') 

2=0 

Both verification metrics, the number of expected omissions and the probability of 
no omissions, depend on rri. n. and We have very little control over the values of m 
and n, as m is bound by the amount of memory we have available (the more the better) 
and n is the size of the transition system under consideration. Therefore, to obtain the 
best results for a fixed m and n, we have to choose the appropriate value of k. Figure 1 
shows that the expected number of omissions is quite sensitive to the number of index 
functions used (note that we use a log scale on the y-axis). We ran 3SP1N, calling the 
Jenkins hash function k times, and using a 1MB Bloom filter on an instance of the PFTP 
problem consisting of 606,21 1 states. We varied k from 1 to 32 and we report the actual 
number of omissions, averaged over 100 runs. Notice that the number of omissions when 
using two index functions is about two orders of magnitude greater than the number of 
omissions when using eleven index functions. 

The second curve in Figure 1 shows the number of expected omissions as given by 
the formula above for n equal to 606,21 1. There is a sizable gap between the two curves 
which at first one may think is due to less-than-ideal index functions, but the disparity is 
actually caused by a shortcoming of the theoretical analysis, which only considers one 
of two types of omissions. “Hash omissions” are those states that are omitted because 
of false positive Bloom filter queries. “Transitive omissions” are those states that are 
never reached because they are made unreachable by other omissions and, thus, are 
never queried against the Bloom filter. The gap in Figure 1 is mostly due to transitive 
omissions; i.e., in our implementation there is an observable number of states (out of 
the 606,21 1 in total) that are never even considered. The theoretical analysis is far from 
useless, however, for as the figure shows, minimizing the number of hash omissions 
tends to also minimize all omissions. 

More significantly, if the number of hash omissions is zero, the number of transitive 
omissions is also zero; consequently, the probability of no hash omissions is also the 
probability of no omissions altogether. Unlike expected omissions, the probability of 
no omissions matches almost exactly experimental results (see Table 1), justifying our 
“transitive omission” argument for the disparity in Figure 1 . 

We have seen that using the optimum number of index functions is very important 
in getting the most accuracy out of Bloom filters. While an analysis showing how to 
choose k is beyond the scope of this paper, a useful formula for estimating the best k 
given m and n is 

[3.8^^ i--lnZ] 
n 
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Fig. 1. We show the expected and observed omissions out of 606,211 states using 1MB for the 
Bloom filter, as k is varied. The theoretical optimum value for k is 11. 
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This closed estimate for the best k in verification was derived by refining a formula 
from [2], — • In 2, which estimates the k that minimizes the false positive rate. Validation 
of our formula’s accuracy will appear in future work. 

3 Double and Triple Hashing 

While Bloom filters employing more than two index functions can improve accuracy, 
they were thought to be impractical because of the running time overhead they incur. 
In this section we describe techniques for efficiently computing index values from just 
two or three hash values. Our techniques are similar to the “double hashing’’ scheme for 
collision resolution in open-addressed hash tables. While we give a short overview of 
double hashing below, a good reference is Chapter 1 1 of [4], and for a more complete 
account see [14,7]. 

3.1 Double Hashing Description 

Open addressing refers to a type of hashing where elements are stored directly in a hash 
table. To insert an element the hash table is probed until an empty location is found, 
and a query consists of probing the table until either the element is found or it is clear 
that the element is not in the table. The probing sequence is obtained by applying a 
sequence of hash functions to an element. Just as with Bloom filters, applying multiple 
hash functions can incur a significant performance penalty. Double hashing is an efficient 
way of implementing open addressing which only uses two hash functions to generate a 
probing sequence. The first value (call it x) is the starting index of the probing sequence. 
Given some index in the probing sequence, the next is obtained by adding the second 
value (call it y). The addition is done modulo the number of indices to ensure that the 
sum is also a valid index. 

Our double hashing scheme for Bloom filters is based on this idea: instead of using a 
sequence of index functions that are computed independently, use two functions, a and 
b, to compute values x and y, and use simple arithmetic on those values to generate all 
the required indices for each Bloom filter operation: 
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x := a(d) MOD m 
y := b(d) MOD m 
f[0] : = x 
i := 1 
while i < k 

x := (x + y) MOD m 
f[i] := x 
i := i + 1 

Note that f [i] = x + iy MOD to. Although in this pseudocode we store the 
index values into an array f , in actual code we would use the values as soon as they are 
computed, and, likewise, only compute as many values as are needed. 

We MOD a(d) and b(d) because we are assuming that a and b are stock hash 
functions that have not been tailored to output values in our index space. SPIN requires 
similar MOD operations to get index values from hash functions, but the designers chose 
to only allow Bloom filter sizes that are powers of 2 so that efficient bit masking can be 
used for the MOD operations. Implementation of the MOD operations are discussed in 
Section 5.2, in which we describe how to efficiently loosen the power-of-two restriction. 

As presented, the algorithm has a complication with respect to values of y. For 
example, if y = 0, only one unique index is probed. One way to fix this problem is to 
ensure that y is relatively non-zero and relatively prime to to. The way 3SPIN deals with 
this issues is discussed in Section 5.3. 



3.2 Double Hashing Example 

It may not be clear to those who are not intimately familiar with Bloom filters that 
our double hashing scheme can actually give higher accuracy than using just two index 
functions. Figures 2, 3, and 4 demonstrate that boosting k with double hashing can lead 
to better accuracy. 
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Fig. 2. We add data 79, 49, and 81 to a Bloom filter where k — 2 and m = 11. A collision occurs 
when 81 is queried/added. (We have set up the index functions to operates identically with or 
without double hashing when k = 2.) 
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double hashing is used. A collision occurs when 14 is queried/added. 
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Fig. 4. This is the same example as in Figure 3 except k = 4. No collisions occur. 



Figure 2 shows a Bloom filter to which elements 79, 49, and 81 are added. For 
this Bloom filter k = 2; that is, as many as two bits are set for each added element. 
If we interpret the figure as not using double hashing, the hash functions h 0 and hi 
serve as the index functions, and are defined as hO(d) = d MOD 11 and hl(d) = 
( d DIV 11) MOD 11. We can make the k = 2 case of double hashing yield the same pair 
ofindicesforallinputsbymakinga(d) = hO(d) andb(d) = (hl(d) — hO(d)) MOD 11. 
Recall from the double hashing algorithm that we compute the index functions for double 
hashing with fi(d) = ( a(d ) + i • b(d)) MOD to, and in our example m = 11. 

Adding 82 in Figure 2 does not change any bits in the Bloom filter and, thus, would 
have caused a hash omission if we were exploring a state space. If we boost k to 3 with 
double hashing, however, the collision is avoided and state 8 1 would not be omitted, as 
illustrated in Figure 3. 
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Likewise, adding 14 in Figure 3 would lead to an omission. Figure 4 shows that 
if 4 double-hashed index functions had been used instead, there would have been no 
omissions. 

We have shown that using double hashing to implement more index functions can 
yield better accuracy than just using two hash values as indices, but more important 
is the degree of double hashing’s accuracy and how that accuracy compares to using 
independent hash functions. 

3.3 Double Hashing Accuracy 

To test the accuracy of double hashing with respect to the expected number of omissions, 
we ran 3SP1N on a 606,21 1 -state instance of PFTP using both double hashing and 
independent hash functions, while varying k. Figure 5 contains the results, where each 
data point is obtained by averaging over 100 runs. Notice that the number of omissions 
that occur with double hashing is very similar to the number of omissions we get when 
using independent hash functions. Also, the best choice of k for the independent case 
seems to be the best for the double hashing case as well. 
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Fig. 5. The above shows the number of omissions from a 606,21 1 -state instance of PFTP when 
varying the number of index functions and the specified index function implementation. Each data 
point is the average of 100 iterations. The curve with +’s is the same as in Figure 1. 
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To test the accuracy of double hashing with respect to the probability of no omissions 
at all, we ran 3SPIN on the same 606,211-state instance of PFTP with a 2MB Bloom 
filter using both double hashing and independent hash functions, with k set to 21. A 
theoretical analysis reveals that verification will be exhaustive 93.4% of the time, and as 
Table 1 demonstrates, double hashing performs very close to the theoretical expectation, 
though 1 | times faster than the independent hash function implementation. 

The competitive accuracy of double hashing breaks down, however, if one uses 
enough memory that using independent hash functions would have an almost unde- 
tectable probability of omissions, such as 1/16,000. Under such a setup, double hashing 
still has omissions about 2.5% of the time (see Table 3). This observation motivates 
something stronger than double hashing that has virtually the same speed. 
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Table 1 . We show the results of verifying a 606,211-state instance of PFTP using 2MB for the 
Bloom filter, 21 index functions, and the specified implementation of those index functions. Each 
data point is the average of 20,000 iterations. 



Implementation 


Full coverage runs 


Average running time 


Independent 


93.281% 


19.88 seconds 


Double Hashing 


92.793% 


4.43 seconds 


Theoretical 


93.383% 


N/A 



3.4 Triple Hashing 

In order to achieve very low probabilities of omission, we have extended the idea of 
double hashing to what we call triple hashing. The idea is to use a function c to compute 
a value z which modifies y at each step, which is initially b (d) . The implementation of 
triple hashing is an obvious extension to that of double hashing: 

x, y, z := a(d) MOD m, b(d) MOD m, c(d) MOD m 
f[0] := x 
i := 1 
while i < k 

x := (x + y) MOD m 
f[i] := x 

y := (y + z) MOD m 
i := i + 1 

Note that f [i] = x + iy + MOD m. The first of two intuitions that can 

explain the superiority of triple hashing is that we utilize more hash values, and thus 
collisions in the Bloom filter are less likely to occur. The second intuition is that because 
the function is more complicated, there is a smaller chance of several indices overlapping 
with several indices from a single previous addition. 

This pseudocode for triple hashing suggests triple hashing would have significantly 
more per-/,: overhead than double hashing would, but most of the per -k overhead in 
double and triple hashing comes from main memory latency. Table 2 demonstrates this. 
The overhead of triple hashing vs. double hashing at k = 20 is not nearly enough to make 
(Double, k = 21) faster than (Triple, k = 20), assuming we do not have to compute any 
more hash values-an assumption that is addressed in Section 4. 

Finally, in Table 3 we show that triple hashing can achieve much higher accuracy than 
double hashing. Triple seems to come much closer to what we expect from independent 
hash functions, but triple hashing is, of course, much faster than using independent hash 
functions, as Table 2 confirms. 

4 The Jenkins Hash Function 

The previous section gave an efficient way to reduce the problem of computing k index 
values from a state to the problem of computing just two or three. In this section we 
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Table 2. We present the running times of verifying a 606,21 1 -state instance of PFTP using 2MB 
for the Bloom filter, the specified number of index functions, and the specified implementation 
of those index functions, although all implementations did the same amount of hash computation 
whether needed or not. 



Implementation 


Index functions 


Average running time 


Double 


21 


3.78 seconds 


Triple 


21 


3.84 seconds 


Double 


20 


3.61 seconds 


Triple 


20 


3.73 seconds 


Independent 


21 


9.36 to 19.88s (see Table 4) 



Table 3. We show the results of verifying a 606,21 1 -state instance of PFTP using 3MB for the 
Bloom filter, 30 index functions, and the specified implementation of those index functions. We use 
100,000 iterations for each implementation, which is insufficient for quantifying the accuracies 
with any precision, but does give strong indication of the magnitudes. 



Implementation 


Proportion of runs with any omissions 


Double Hashing 


1 in 40 


Triple Hashing 


1 in 10,000 


Theoretical 


1 in 16,352 



show how to get the most data out of a popular hash function and compute these two or 
three values in much less time than the default configuration of SPIN 4.0.7 computed 
the two hash values for k = 2 bitstate hashing. 



4.1 Getting the Most from Jenkins 

Because of its high quality and fast speed. Bob Jenkins’ LOOKUP2 hash function [13, 
12] is a popular choice among implementers of hash tables and Bloom filters; after all, 
the function is the default hash function in SPIN 4.0.7. Even though it only produces a 
32-bit value, the function is often used to produce larger values or sequences of values 
by calling the function multiple times with different seed values. 

If we take a look at the LOOKUP2 function, however, we see that it propagates a 
full 96 bits of data as it iterates over the input. What the function returns is just a 32-bit 
fragment of the propagated 96 bits. Although the word returned is the only one that 
satisfies certain properties that can be tested in Jenkins’ lookup2.c [12], we have found 
that for our purposes, extracting all three words from a single run of Jenkins is about as 
good as calling the function three times. 
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4.2 Accuracy Validation 

First we ran tests on Jenkins to make sure that each of the three output words are uniform 
on their own. If this were not the case, two sufficiently large 1 , unique, randomly-chosen 
inputs would have better than a 1 in 2 32 chance of producing the same 32-bit word of 
output. Equivalently, a 32-bit output is not uniform if inputs have better than a 1 in 2 s 
chance of their output matching one of 2 24 unique outputs. Running exactly this test 
repeatedly for each output word yielded probabilities that quickly converged at 1 in 2 8 , 
as desired. 

Next we sought to evaluate pairwise independence among the three pairings of 
output words. We followed a similar procedure to that above, except we were attempting 
to establish the uniformity of a 64-bit output. Observing only one repeated 64-bit output, 
we were unable to put an upper bound on the entropy in the output, but our results indicate 
the entropy is likely greater than 60 bits for each pair of words, leaving no doubt that 
extracting more than one word from Jenkins gives us access to substantially more hash 
information, if not a full 96 bits. 

From a more practical standpoint, we ran tests to validate that using all three words 
from each call to Jenkins gives about the same accuracy in a Bloom filter as calling 
Jenkins three times. Table 4 shows the results of 20,000 executions each for two versions 
of SPIN, both of which use 21 index functions. The “Slow Jenkins” version uses separate 
calls to Jenkins for each index function — up to 21 calls for each Bloom filter operation. 
The “Fast Jenkins” version uses three words from each call to Jenkins and, thus, incurs 
a maximum cost of seven Jenkins calls per operation. We actually observed slightly 
higher accuracy with “Fast Jenkins,” but the results are not statistically significant enough 
to establish that relationship. The results do establish that both implementations yield 
accuracy exceptionally close to what is expected in theory. 

NOTE: These tests do not utilize double or triple hashing; the combination of all 
techniques is tested and validated in Section 6. 



Table 4. We show the results of verifying a 606,21 1-state instance of PFTP using 2MB for the 
Bloom filter, 21 index functions, and the specified implementation of those index functions. We 
ran 20,000 iterations of each implementation. 



Implementation 


Full coverage runs 


Average running time 


Slow Jenkins 


93.281% 


19.88 seconds 


Fast Jenkins 


93.339% 


9.36 seconds 


Theoretical 


93.383% 


N/A 



1 We tested using seven words of input. 
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4.3 Speed Boost 

Table 4 also includes execution times for the “Slow Jenkins” version and the “Fast 
Jenkins” versions. Hash computation clearly dominates the total execution cost, because 
a 67% reduction in hash computation time resulted in a 53% reduction in overall required 
execution time. 

The “Fast Jenkins” version utilizing three index functions runs more quickly than 
the Jenkins configuration of SPIN 4.0.7, which uses two index functions, because “Fast 
Jenkins” can generate about three index functions with a single call to Jenkins. The 
results of these tests are in Table 5. The Jenkins configuration of SPIN 4.0.7 is the k = 2 
case of what we have been calling “Slow Jenkins”. 



Table 5. We show the execution times for verifying a 606,21 1-state instance of PFTP using 2MB 
for the Bloom filter. The number and implementation of the index functions is indicated in the 
table. 



Implementation 


Index functions 


Average running time 


SPIN Jenkins 


2 


2.57 seconds 


Fast Jenkins 


2 


1.86 seconds 


Fast Jenkins 


3 


2.09 seconds 



5 Arbitrary Memory Utilization 

Two restrictions on the amount of memory that can be utilized by a Bloom filter in SPIN 
can have profound effects on the accuracy of bitstate verification. The first and most 
clearly debilitating limitation is the upper limit on the amount of memory that can be 
dedicated to a Bloom filter, 512 Megabytes 2 . The second limitation is that the Bloom 
filter in SPIN can only be sized to be a power of two. The impact of both limitations is 
great: theoretical analysis shows that a user of accurate bitstate verification who dedicates 
768MB of memory to the Bloom filter instead of 5 12MB could have a 1 in 10,000 chance 
of any omissions instead of 1 in 5. 

5.1 Increasing the Maximum 

The reason for SPIN’S maximum of 512 Megabytes dedicated to the Bloom filter is that 
512 Megabytes is equal to 2 32 Megabits, and 32-bit values are used to index into the bit 
vector. The problem is that as byte- or word-addressed memories get close to the size 
of their address space, single words become insufficient for addressing individual bits 
across most of memory. The computer market is currently experiencing this problem, in 
which many 32-bit machines are sold with more than 512 Megabytes of memory. 

2 SPIN 4.0.7 would actually only work with up to 256MB for us, but analysis suggests that this 
is an implementation bug and not a design flaw. 
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Any solution to this problem would almost certainly involve more computation, so the 
best solution is likely to be one that eliminates the need for some existing computation. 
An example of such existing computation is the process of dividing a bit vector index into 
a word or byte index and an index of the bit within that word or byte. These operations 
boil down to dividing by some small power of two and taking the modulus with that same 
power of two, which can be implemented with bit shifting and bit masking, respectively. 

Our solution, which we call “parallel indexing,” accommodates any addressable 
amount of memory and eliminates a little bit of previously required per-fc computation 
by computing word indexes and bit-within-word indexes independently. Consider having 
two sets of index functions: F {j , F- t , , Fk-i give the addresses of the words to retrieve 
and/o, /i, . . . , fk - 1 tell which bit to extract from each word. We can apply triple hashing 
on an A, B, and C to get the F, values and the same on a, b, and c to get the /,; values. 
Because none of these functions is ever required to return more bits than can be stored 
in a word, the computation is simple. 

5.2 Precise Granularity 

Modifying SPIN to use any specified amount of memory for the Bloom filter is simple, 
but ensuring that accuracy is maintained and that the implementation is efficient is not 
as easy. The simple answer to using any amount of memory is to allocate that much, 
and then MOD hash function results by the appropriate value whenever indexes are 
computed. 



Accuracy. The first problem with the simple answer is that MOD-ing by any value 
can affect the accuracy of the data structure. Consider a case in which one is not using 
“parallel indexing” and allocates about |rds of 2 32 bits, about 341MB, for the Bloom 
filter. If we MOD the result of a 32-bit hash function to get an index, the first half of the 
indexes are twice as likely to be chosen as the second half. We can think of the MOD 
operation as putting the input values into m equivalence classes. No matter how hard 
we try to make the distribution among classes more uniform than what MOD gives us, 
if we have 50% more elements than equivalence classes, half of the classes are going to 
contain two elements and half are going to contain one element. 

Our choice of indexing words as opposed to bytes in the parallel indexing scheme 
lessens the impact of the uniformity problem by a factor of four (in the 32-bit case), 
making the problem unlikely to ever have an observable impact. Whereas byte indexing 
gave a worst case of some indexes being twice as likely to be chosen as others, word 
indexing yields a worst case of some being 25% more likely. So even if m is a few 
Gigabytes, the difference is not significant, as Table 6 reveals. 



Speed. The simple solution’s second problem is that MOD operations on arbitrary values 
are much more costly than, for example, taking a modulus with respect to a power of two, 
which can be implemented with bit masking. In fact, outside of SPIN we have observed 
C’s unsigned modulus operator to be ten times as slow as bit masking on a Pentium 4. 
Which MOD operations can we optimize away if using double or triple hashing and the 
parallel indexing scheme? 
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Table 6. We show the result of verifying a 723,035-state instance of LEADER using 2MB for the 
Bloom filter and 17 independent index functions, while varying the ratio of the probability of an 
index landing in the first half of the bit space over the second. Each data point is the average of 
420 iterations. 



Case 


Ratio 


% of full coverage runs 


Byte indexing 


2 


1 


16.16% 


Word indexing 


5 


4 


39.29% 


Uniform 


1 


1 


41.19% 



The first observation is that the range on the bit-within-word indexes are always 
powers of two and, thus, can be optimized with bit masking. 

While it is important for the a values in computing word indexes to have a fairly 
uniform distribution over all possible indexes, cheating on b and c does not sacrifice 
as much. In fact, we can MOD with respect to the greatest power of 2 less than m to 
compute values for b and c, enabling us to use bit masking for these. 

Although we have reduced the number of unoptimized MOD operations for the 
initialization phase of each Bloom filter operation to just one (computing a), the most 
important MOD operations to optimize are those that happen within the iteration part 
of each Bloom filter operation, executing as many as k times for each Bloom filter 
operation. 

In the triple hashing case, we can cheat even further on the ranges of values for b and 
c and eliminate the MOD for y := y + z altogether. More specifically, if b + c- k < m 
then y (initialized to b) will never overflow with respect to m, because y only needs to 
be incremented by z (whose value is c) fc — 1 times. 

The following observation allows us to speed up the MOD for x : = (x + y) : on 
each iteration of the loop we are guaranteed that 0 < x < to and 0 < y < m. Thus, 
0 < x + y < 2m, leaving only two cases to handle: (x + y) MOD m = x + y (if 
x + y < m) and (x + y) MOD m = x + y — m (otherwise; m < (x + y) < 2m). We 
update this line from the pseudocode to reflect the optimization: 

x := (x + y) MOD m 

to be these lines: 

x := x + y 

if (x >= m) then x := x - m 

The new code is much more efficient, as the graph in Figure 6 shows. The faster version 
in the graph implements triple hashing and all the optimizations discussed in this section, 
requiring just one unoptimized modulus per Bloom filter operation. The slower version 
does not use the optimization just described, requiring up to k unoptimized modulus 
operations per Bloom operation. 

One thing to notice about the times reflected in the graph is that whenever the memory 
space is a power of two, both implementations run at the same slightly faster speed. Our 
modified version of SPIN dynamically picks the implementation best suited for the 
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Memory available to Bloom filter (MB) 

Fig. 6. This graph plots verification times for PFTP5 (n = 606, 211) and depicts the difference 
in execution times resulting from optimizing k— 1 modulus operations per Bloom operation, k is 
varied from 1 to 27 to be optimal with respect to m/n. 

choices of m and k. There are implementations optimized for when m is a power of two 
and, orthogonally, for when k < 2. 

From the observation that the power of two cases are optimized in the graph, we see 
that even after our optimizations for utilizing arbitrary memory, we can still incur an 
execution speed cost of up to a few percent. Such an overhead is likely to be well worth 
the cost if it enables someone to use nearly twice as much memory. 

5.3 With Our 96-Bit Jenkins 

In this short section we reveal the synergy between the various approaches to improving 
the speed and accuracy of bitstate verification in SPIN. 

With the 96 bits we get from a single call to Jenkins, we have enough hash information 
to effectively utilize triple hashing, parallel indexing, and precise memory utilization. 
We call our version incorporating all of these enhancements “Triple SPIN,’’ or 3SPIN 
for short. Figure 7 has the precise breakdown of hash information used in 3SPIN. 




Fig. 7. The above diagram shows how we utilize the 96 bits of output from Jenkins. 





72 



P.C. Dillinger and P. Manolios 



A, B, and C give 3SPIN triple hashing on the word indexes, and a and b give double 
hashing on the bit-within-word indexes. Notice that we only use 4 bits for b even though 
it could be a 5-bit value; the reason is that making b odd (by multiplying by two and 
adding one) ensures that every bit-within-word index is unique up to k = 32. This 
guarantee ensures that each Bloom filter operation addresses k unique bit positions in 
the bit vector. 

6 Overall Evaluation 

In this section we evaluate 3SPIN, the system incorporating all the techniques described 
in this paper. Figure 8 shows that the observed average omissions for the various im- 
plementations is so close that it is hard to detect any differences. As previous tests have 
shown, we would need many high-accuracy runs to have a chance of distinguishing the 
implementations based on accuracy. 

Figure 8 also shows the execution times for the tests. Unlike the number of omissions, 
the execution times are profoundly different, with our techniques taking about 1/4 the 
time of the implementation not taking advantage of our improvements when k = 14. 
Notice also that our k = 14 takes less than twice as much time as our k = 2 — a far cry 
from Holzmann’s experience with independent hash functions [11,10], which suggests 
k = 14 to be seven times as slow. 

Figures 9 and 10 show the results when various amounts of memory are available 
for allocation to the Bloom filter. Notice that the versions that do not incorporate any of 
our enhancements for arbitrary memory allocation to the Bloom filter can only utilize an 
amount of memory equal to the greatest power of two not greater than m. For example, 
when 48MB is available, the unenhanced versions act as if only 32MBs are available, 
because that is the most the user can specify without requiring more than 48MB. If 
only 32MB is available, all versions using the best k (14 in this case) expect around 
100 omissions, but when 48MB is available, 3SPIN expects about 1/1 00th as many 
omissions. Even though 3SPIN is using k = 21 to make best use of the 48MB, it runs 
in about 2/3rds the time. If, once again, only 32MB were available, 3SPIN would run in 
about half the time of the version with independent hash functions. 

The version using two independent index functions was included in Figures 9 and 10 
to reflect what is available in the latest version of SPIN, 4.0.7. According to these results, 
3SPIN can utilize about 7 index functions ( m = 14MB in this case) as fast as SPIN 4.0.7 
can utilize two, and at that point 3SPIN expects about 1/1 3th as many omissions, partially 
because it is utilizing more memory and partially because it is using a more suitable k. 

Our last set of experimental results (Table 7) just confirm that our results generalize 
to models other than those we have used in the rest of the paper. 

7 Conclusions and Future Work 

Early work by Holzmann and others has shown the utility of the Bloom filter data 
structure for probabilistically verifying systems with explicit state model checkers [9]. 
The main parameter for tuning a Bloom filter is the number of hash functions used, k, 
but there is a tension between accuracy and efficiency, as small values of k lead to fast 
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Fig. 8. On the left we have plotted the number of omissions from a 14,536,469-state instance of 
PFTP(D=1,Q=2) using 32MB for the Bloom filter and k values up to 14, the optimal for this m 
and n. The right shows the execution times for the same tests. Each data point represents about 5 
iterations. 
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Fig. 9 . Here we have plotted the number of omissions from PFTP(D=1 ,Q=2) for various implemen- 
tations and various amounts of memory available to the Bloom filter. Notice that implementations 
only supporting power-of-two granularity will only utilize the greatest power of two less than or 
equal to the amount available. Each data point is the average over about 20 iterations. 



Table 7. Validation of our approaches using models other than PFTP In each case, all our techniques 
and optimizations are used. The k values are annotated with either “(opt)”, indicating that the choice 
was optimal for m and n, or “(sub)” indicating we chose a k different from the optimal. All these 
models are included in the SPIN distribution. 



Model 


States 


m 


k 


% runs full 


Expected 


Iterations 


Peterson4 


7,308,888 


32MB 


25 (opt) 


99.11% 


99.15% 


336 


Leader7 


723,035 


3MB 


8 (sub) 


77.34% 


75.69% 


331 


Sort9 


2,509,313 


8MB 


20 (opt) 


59.88% 


63.38% 


329 
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Fig. 10. This graph shows times for the executions in Figure 9. Note that four implementations use 
the optimal values for k, which range from 1 to 27 depending on m. The other implementation, 
SPIN 4.0.7, always uses k = 2, which explains why it becomes the fastest at m = 14MB. 



running times, but the value of k that yields the best accuracy may be quite large, SPIN 
is optimized for speed, and, thus, it only allows k to be 1 or 2. Holzmann justified this 
choice by pointing out that running a well-tuned model checker with 2 hash functions 
can be | times faster than using j hash functions. The belief was that one could get 
accuracy or efficiency, but not both. 

We show that you can have your cake and eat it too. We have entitled this paper “Fast 
and Accurate Bitstate Verification for SPIN,” because that is exactly what we provide 
with 3SPIN, a system we developed by modifying SPIN 4.0.7. Key components of 3SPIN 
include our double and triple hashing techniques for Bloom biters, which greatly reduce 
the execution time of highly-accurate bitstate veribcation. In fact, 3SPIN can use about 
7 hash functions while running as fast as SPIN (using 2 hash functions). 

3SPIN also has the ability to use as much main memory for the Bloom biter as is 
available, whereas SPIN only allows the size of the Bloom biter to be a power of 2, up to 
512MB. The motivation behind this improvement is simple: using more available main 
memory for the Bloom biter always improves the expected accuracy of a bitstate search. 
For example, by using just 50% more memory than SPIN allows, we can be 2,000 times 
less likely to have an omission. 

For future work, we plan to explore the use of Bloom biters in veribcation from 
a more analytical standpoint and to examine the impact of this work on techniques 
such as sequential multihashing [ 10]. We also plan to compare our techniques to other 
probabilistic veribcation techniques such as hash compaction [15,16]. 
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Abstract. In the classic approach to logic model checking, software verifica- 
tion requires a manually constructed artifact (the model) to be written in the lan- 
guage that is accepted by the model checker. The construction of such a model 
typically requires good knowledge of both the application being verified and of 
the capabilities of the model checker that is used for the verification. Inade- 
quate knowledge of the model checker can limit the scope of verification that 
can be performed; inadequate knowledge of the application can undermine the 
validity of the verification experiment itself. 

In this paper we explore a different approach to software verification. With this 
approach, a software application can be included, without substantial change, 
into a verification test-harness and then verified directly, while preserving the 
ability to apply data abstraction techniques. Only the test-harness is written in 
the language of the model checker. The test-harness is used to drive the appli- 
cation through all its relevant states, while logical properties on its execution are 
checked by the model checker. To allow the model checker to track state, and 
avoid duplicate work, the test-harness includes definitions of all data objects in 
the application that contain state information. 

The main objective of this paper is to introduce a powerful extension of the 
SPIN model checker that allows the user to directly define data abstractions in 
the logic verification of application level programs. 



1. Introduction 

In the classic approach to software verification based on logic model checking tech- 
niques, the verification process begins with the manual construction of a high-level 
model of a source program. The advantage of this approach is that the model can 
exploit a broad range of abstraction techniques, which can significantly lower the ver- 
ification complexity. The disadvantage is that the construction of the model requires 
not only skill in model building, but also a fairly deep understanding of the function- 
ing of the implementation level code that is the target of the verification. Any misun- 
derstanding translates into a loss of accuracy of the model and thereby into a loss of 
accuracy of the verification. If errors are found in the model checking process, these 
misunderstandings can often be removed, but if no errors are found the user could 
erroneously conclude that the application was error free. 

In this paper we describe a new verification method that in many cases of practical 
interest can avoid the need to manually construct a verification model, while still 
retaining the capability to define powerful abstractions that can be used to reduce 
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verification complexity. We call this method “model-driven software verification.” 

In Section 2 we discuss the basic method of using embedded C or C++ code within 
SPIN verification models, and we discuss a relatively small extension that was intro- 
duced in SPIN version 4.1 to support data abstraction on embedded C code. Section 3 
contains a discussion of two example applications, Section 4 reviews related work, 
and Section 5 presents our conclusions. 

2. Model Checking with Embedded C Code 

SPIN versions 4.0 and later support the inclusion of embedded C or C++ code within 
verification models [8,9]. A total of five different primitives can be used to connect a 
verification model to implementation level C code. Some of these primitives serve to 
define what the state of the model is, with optionally some of the state information 
residing in the application. Other primitives serve to define either conditional or 
unconditional state transitions. 

One of the primitives that can be used to define state is c_decl, which is normally 
used to introduce the types and names of externally declared C data objects that are 
referred to in the model. Another primitive of this type is c_track, which is used to 
define which of the data objects that appear in the embedded C code should be con- 
sidered to hold state information that is relevant to the verification process. 

Two other primitives define state transitions with the help of C code. The first of these 
is c_code, which can be used to enclose an arbitrary fragment of C code that is used 
to effect the desired state transition. The last primitive we discuss here is c_expr, 
which can be used to evaluate an arbitrary side-effect free expression in C to compute 
a Boolean truth value that is then used to determine the executability of the statement 
itself. 

Figure 1 illustrates the use of these four primitives with a small example. 

c_dec 1 { 

extern float x; 

extern void fiddle (void) ; 

) ; 

c_track "&x" " sizeof ( float )" ; 
init { 

do 

:: c_expr { x < 10.0 } -> c_code { fiddleO; } 

: : else -> break 
od 

} 

Fig. 1. Embedded C Code Primitives. 

The first statement in this example introduces definitions of an externally declared 
floating point variable named x and an externally declared function named fid- 
dleO . Presumably, the variable holds state information we are interested in, and the 
function defines a state transition. To record the fact that the external variable holds 
state information that must be tracked by the model checker, the c_track primitive is 
used to provide the model checker with two salient facts: the address of the variable 
and its size in bytes. The model checker will now instrument the verification engine 
to copy the current value of variable x into the state descriptor after any transition in 
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which its value could have changed (i.e., after every execution of a c_code state- 
ment). Similarly, whenever the verifier performs a backtracking step for any state- 
ment that could have changed the value of the variable, the code of the verifier is 
again instrumented to reset the value of x to the copy of its previous value that was 
stored in the state descriptor for the earlier state. 

The init process in this example model will now repeatedly call the external func- 
tion fiddle ( ) until it sees that the value of floating point variable x is less than 10.0, 
after which it will stop. 

Effectively, these extensions allow us to introduce new datatypes in verification mod- 
els, well beyond what PROMELA supports, and it allows us to define new types of 
transitions, with SPIN performing the normal model checking process. 

It is important to note here that the c_track primitive, as used here, supports two 
separate goals: state tracking and state matching. 

□ State tracking allows us to accurately restore the value of data objects to their 
previous states when reversing the execution of statements during the depth- 
first search process. 

□ State matching allows us to recognize when a state is revisited during the 
search. When a state is revisited, the model checker can immediately backtrack 
to a previous state to avoid repeating part of the search that cannot yield new 
results. 

We will see shortly that if we modify the way in which state information can be 
matched, while retaining accurate tracking, we can define powerful abstractions that 
can significantly reduce the complexity of verifications. 

2.1. Tracking without Matching 

There are cases where the value of an external data object should be tracked, to allow 
the model checker to restore the value of these data objects when backtracking, but 
where the data object does not actually hold relevant state information. It could also 
be that the data object does hold state information, but contains too much detail. In 
these cases we would benefit from defining abstractions on the data that are used in 
state matching operations, while retaining all details that are necessary to restore state 
in the application in tracking operations. 

A relatively small extension that makes it possible to do this was included in SPIN, 
starting with version 4.1. The extension is to support an additional, and optional, 
argument to the c_track primitive that specifies whether the data object that is 
referred to should be matched in the statespace. There are two versions: 

c_track "&x" " sizeof ( float ) " "Matched"; 

and 



c_track "&x" " sizeof ( float ) " "UnMatched"; 

with the first of these two being the default if no third argument is provided (and 
backward compatible with the original definition of the c_track primitive in [9]). 

The value of unmatched data objects is saved on the search stack, but not in the state 
descriptor. 

The resulting SPIN nested-depth first search algorithm is identical to the one 
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1 proc dfsl(s) 

2 add s to Stackl 

3 add {f(s),0} to States 

4 for each transition (s,a,s') do 

5 if {f(s'),0> not in States then dfsl(s') fi 

6 od 

7 if accepting ( f (s) ) then seed := {f(s),l}; dfs2(s,l) fi 

8 delete s from Stackl 

9 end 



10 proc 

11 
12 

13 

14 

15 

16 

17 

18 end 



dfs2(s) /* nested search */ 
add s to Stack2 
add {f (s) ,1} to States 
for each transition (s,a,s') do 

if {f (s') ,1} == seed then report cycle 
else if {f (s') ,1} not in States then dfs2(s') 
od 

delete s from Stack2 



fi 



Fig. 2. Nested Depth-First Search with Abstraction (cf. Fig. 8 in [1]). 



discussed in [1] in the context of a discussion on symmetry reduction. For con- 
venience, Figure 2 reproduces the algorithm as it was discussed in [F] (see [9] for a 
more basic description of the standard nested depth-first search algorithm). The 
abstract representation of a state is computed here by abstraction function f ( s ) . 

2.2. Validity of Abstractions 

The extension of the c_track primitive allows us to include data in a model that has 
relevance to the accurate execution of implementation level code, but no relevance to 
the verification of that code. 

The simplest use of this new option is to use it to track data without storing it in the 
model checker’s state-vector, where before the only way to track it would have been 
to do just that. When used in this way, the use of unmatched c_track primitives 
equates to data hiding. 

Another use is to use unmatched c_track statements to hide the values of selected 
data objects from the state-vector, and then to add abstraction functions (implemented 
in C) to compute abstract representations of the data that will now be matched in the 
state-vector, using additional matched c_track primitives. We can now achieve true 
abstractions, though of course only of the value of data objects. 

As a simple example of the latter type of use, consider two implementation level inte- 
ger variables x and y that appear in an application program. Suppose further that the 
absolute value of these two variables can be shown to be irrelevant to the verification 
attempt, but the fact that the sum of these two variables is odd or even is relevant. We 
can now setup the required data abstraction for this application by defining the follow- 
ing c_track primitives: 

/* data hiding */ 

c_track "&x" " sizeof ( int ) " "UnMatched" ; 
c_track "&y" "sizeof (int ) " "UnMatched"; 

/* abstraction: */ 

c_track "&sumxy" " sizeof (unsigned char) 



Matched" ; 
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and we add the abstraction function: 



c_code 

} 



void abstraction (void) { sumxy = (x+y)%2; } 



which should now be called after each state transition that is made through calls on 
the application level code. 

The abstractions have to be chosen carefully, to make sure that they preserve the logi- 
cal soundness and completeness of the verification. This is clearly not true for all 
possible abstractions one could define with the mechanism we have discussed. Con- 
sider, for instance, the abstraction above in the context of the following model: 

init { 

c_code { x = y = 0; abstraction () ; }; 
do 

:: c_code { x = (x+l)%M ; y = (y+l)%N ; abstraction () ; } 
od 

} 

Suppose we are checking the invariant that x + y is even. This invariant holds for the 
model above if both M and N are even, but not in general. For instance, it does not 
hold for the case M = 3, N = 4. In this case, the model checker would stop the search 
after exploring the first transition, and erroneously declare that the invariant holds. In 
the next subsection, we describe a sufficient condition for ensuring that the verifica- 
tion is sound with respect to the abstraction, for a given model. 



2.3. Sufficient Conditions for Soundness 

The model-checker checks properties by exploring the set of reachable states, starting 
from a predetermined initial state. Exploring a state consists of enumerating its suc- 
cessors, determining which of these states are potentially relevant , and recording the 
newly encountered states in some data-structure. This data-structure is, for instance, a 
stack in a depth-first search, and it is a queue in a breadth-first search. 

Given a symmetric relation ~ on concrete states, a state s is considered relevant if the 
search has not seen any state t such that s~t. In the following discussion, we will 
refer to states that have been visited in the search at least once as encountered states, 
and to those encountered states whose complete set of successors has been computed 
as explored states. (Note that the use of no abstraction corresponds to the situation in 
which ~ is the identity relation. In this case a state is considered relevant if it has not 
been encountered before.) 

The search algorithm we have outlined explores states in the concrete model, and in 
effect maintains a concrete path to each state on the depth-first search stack. Thus any 
abstraction relation is necessarily logically complete. To ensure that the abstraction 
relation ~ is also logically sound, we need to ensure that its use does not cause the 
search algorithm to miss any error states. Below, we present a condition for ensuring 
this. We use the following notation: w, x, y, z denote states, and cr, r denote paths. 
We write er, to denote the ;-th state in o. We use — > to denote the transition relation, 
so x — > y denotes that there is a transition from state x to state y. 

A symmetric relation on concrete states is a bisimulation [13] when it satisfies the fol- 
lowing condition: 



Vw, y, z: w~y A v — > z => (3x: w —> x A x~z ) 



(1) 
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Thus states w and y are bisimilar if, whenever there is a transition from y to z, there is 
a successor x of w such that x and y are also bisimilar. Given a bisimulation we 
say paths tr and r correspond when Vi: a t,-. 

The importance of bisimulation is given by the following theorem [1,2], 

Theorem. Let ~ be a bisimulation, and let AP be a set of atomic propositions such 
that every proposition P in AP satisfies 

\/x, y: P(x) A x~y => P(y) (2) 

Then, any two bisimilar states satisfy the same set of CTL state formulas over propo- 
sitions in AP. Furthermore, any two corresponding paths satisfy the same set of 
CTL path formulas over propositions in AP. □ 

This means that when conditions (1) and (2) are both satisfied, the abstraction will 
preserve logical soundness. 

3. Two Sample Applications 
3.1. Tic Tac Toe 

We will illustrate the use of the new verification option, and the types of data abstrac- 
tion it supports, with a small example. For this example we will use a model of the 
game of tic tac toe. First, we will write the model in basic PROMELA, as a pure SPIN 
model and show its complexity. Then we will rewrite the model to include some 
operations in embedded C code, and we will show how abstractions can now be used 
to lower the verification complexity well below what was possible with the pure SPIN 
model, without sacrificing accuracy. 

The pure PROMELA version of the model is shown in Figure 3. 

We represented the 3x3 board in a two-dimensional array b, constructed with the help 
of PROMELA typedef declarations. Because the players in this game strictly alter- 
nate on moves, we can use a single process and record in a bit variable z which 
player will make the next move. Player 0 will always make the first move here. A 
square has value 0 when empty, and is set to either 1 or 2 when it is marked by one of 
the two players. In the first if statement, a player picks any empty square to place a 
mark. When no empty squares are left, a draw is reached and the game stops. When 
a move could be made, the player checks for a win, and if one is found it prints the 
board configuration for the winning position and forces a deadlock (to allow us to dis- 
tinguish these states from normal termination where the process exits). We have 
enclosed the computation in an atomic sequence, to achieve that no intermediary 
states are stored during the model checking process, only the final board positions that 
are computed. 

The verification of this model explores 5,510 states. Clearly, we are not exploiting the 
fact that the game board has many symmetries. There are, for instance, both rotational 
symmetries and mirror symmetries (left/right and top/bottom) that could be taken into 
account to reduce verification complexity. In principle, the SPIN model could be 
rewritten to account for these symmetries, but this is surprisingly hard to do, and risks 
the introduction of inaccuracy in the verification process. 

With careful reasoning, we can see that there are only 765 unique board positions in 
the game, and 135 of these positions are winning for one of the two players, e.g. [14]. 
The maximum number of moves that can be made is further trivially 9. In our first 
verification attempt we therefore did considerably more work than is necessary. 
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# define 


SQ (x,y) 


! b. 


r [x] . s [y] 


-> 


b.r [x] 


. s [y] = z+1 






# define 


H (v, w) 


b. r [v 


] . s [0] ==w 


&& 


b.r [v] 


. s [1] ==w 


ScSc 


b.r [v] 


. s [2 ] ==w 


#def ine 


V (v, w) 


b.r [0 


] . s [v] ==w 


&& 


b . r [1] 


. s [v] ==w 


&& 


b. r [2 ] 


. s [v] ==w 


# define 


UD (w) 


b.r [0 


] . s [0] ==w 


&& 


b . r [ 1] 


. s [1] ==w 


&& 


b. r [2 ] 


. s [2 ] ==w 


# define 


DD (w) 


b.r [2 


] . s [0] ==w 


&& 


b . r [1] 


. s [1] ==w 


&& 


b.r [0] 


. s [2 ] ==w 


typedef 


Row { 


byte 


s [ 3 ] ; } ; 














typedef 


Board { 


Row 


r [ 3 ] ; }; 















Board b; 
bit z , won ; 

init { 

do 

: : atomic { /* do not store intermediate states */ 
!won -> 

if /* all valid moves */ 

:: SQ (0,0) :: SQ(0,1) :: SQ ( 0 , 2 ) 

:: SQ (1,0) :: SQ(1,1) :: SQ(1,2) 

:: SQ (2,0) :: SQ(2,1) :: SQ(2,2) 

: : else -> break /* a draw: game over */ 
fi; 



if /* winning positions */ 

:: H ( 0 , z+1 ) || H(l,z+1) || H(2,z+1) 

| V ( 0 , z+1 ) jj V ( 1 , z+1 ) |j V (2 , z+1) 

j UD ( z+1 ) j I DD ( z+1 ) -> 

/* print winning position */ 
printf("%d %d %d\n%d %d %d\n%d %d %d\n" , 

b.r[0].s[0], b.r[0].s[l], b.r[0].s[2], 
b.r[l].s[0], b.r[l].s[l], b.r[l].s[2], 
b.r[2].s[0], b.r[2].s[l], b.r [2] . s [2] ) ; 
won = true /* and force a stop */ 

: : else -> z = 1 - z /* continue */ 



} 



} /* end of atomic */ 
od 



Fig. 3. Tic Tac Toe, Pure PROMELA Model. 

We will now revise the model to make use of a C function to store the board configu- 
ration in a C data structure, and to perform the moves that are non-deterministically 
selected with a SPIN model (which now starts to perform the function of a test-har- 
ness around a C application). 

We will retain the same basic structure of the algorithm. Again, a win will result in a 
deadlock, and a draw will lead to normal process termination. In the first version of 
the model with embedded C code, shown in Figure 4, we track and match all relevant 
external data, which in this case includes just the the board configuration. 

We have introduced variables x and y to record the square that is selected in the first 
part of the algorithm. The location of the square is passed to the C function that will 
now place the mark in that square and check for a win. The C function play ( ) 
returns either a 2 if the last move made produced a win, or a 1 if it did not and the turn 
should go to the next player, as before. 




Model-Driven Software Verification 83 



#define SQ(a,b) c_expr { ( !board[a] [b] ) } -> x=a; y=b 
c_decl { 

extern short board [3] [3]; 
extern short play(int, int, int); 

} ; 



c_track "&board[0] [0] " " sizeof (board) " ; /* matched */ 

byte x, y, z, won; 

init { 
do 

: : atomic { !won -> 

if /* all valid moves */ 



:: SQ(0 


0) 


:: SQ(0 


1) 


: SQ(0 


2) 


:: SQ(1 


0) 


:: SQ(1 


1) 


: SQ ( 1 


2) 


:: SQ(2 


0) 


:: SQ(2 


1) 


: SQ(2 


2) 


: : else 


-> 


break 


/* a draw 


V 



fi; 



c_code { 

switch (play(now.x, now.y, now.z+1)) { 

default: printf (" cannot happen\n"); break; 
case 1: now.z = 1 - now.z; break; 
case 2: now. won = 1; break; /* force a stop */ 
} 

now.x - now.y = 0; /* reset */ 

} 

} 

od 

} 



Fig. 4. Tic Tac Toe, Version with Embedded C Code. 

The external C function is shown in Figure 5. Perhaps not surprisingly, this model 
explores the same number of states as the pure SPIN model, and declares the same 
number of winning positions, though it does not have to search as deeply into the 
depth-first search tree (31 steps instead of 40 for the earlier model). 

We will now change the last model into one that uses data abstraction. The new 
model is shown in Figure 6. 

We have turned off state matching on the board configuration, while retaining the 
tracking capability that allows us to perform an accurate depth-first search. We have 
also introduced a new integer variable named abstract that we will use to record an 
abstract value of the board configuration, taking into account all symmetries that exist 
on the game board. This value is both tracked and matched. The rest of the test har- 
ness specification is unchanged. 

We must now extend the C function a little, to provide the computation of the abstract 
board value. We do so by calling an extra function board_value ( ) that is called in 
function play ( ) immediately after the new mark is placed. 

The details of this computation, shown in the Appendix, are not too important. Suf- 
fice it to say that each board configuration is assigned a unique value between 0 and 
19682, by assigning a numeric place value to each square. The board value is com- 
puted for each rotation and mirror reflection of the board, and the minimum of the 8 
resulting numbers is selected as a canonical representation of the set. That number is 
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#define H(a,b) (board[a] [0]==b && board[a] [l]==b && board[a] [2]==b) 

#define V(a,b) (board[0] [a]==b && board[l] [a]==b && board[2] [a]==b) 

#def ine UD(b) (board[0] [ 0 ] ==b && board[l] [l]==b && board[2] [2 ] ==b) 

# define DD (b) (board[2] [ 0 ] ==b && board[l] [1] ==b && board[0] [2 ] ==b) 

short board [3] [3]; 

short 

play(int x, int y, int z) 

{ 

board [x] [y] = z; /*place mark */ 

/* check for win: */ 

if ( (H(0,z) | | H(l,z) ] | H(2,z) 

| | V(0,z) j j V(l,z) j j V(2,z) 

DD ( z ) |j UD ( z ) ) 

{ Print f ( " %d %d %d\n%d %d %d\n%d %d %d\n" , 
board [0] [0] , board[0][l], board[0][2], 
board [1] [0] , board[l][l], board[l][2], 
board [2 ] [ 0 ] , board[2][l], board [2 ] [2 ] ) ; 
return 2; /* last move wins */ 

} 

return 1; /* game continues */ 

} 



Fig. 5. C Source Code for Play(). 

stored in the state descriptor, and used in state matching operations. A state will now 
match if a board configuration is encountered that is equivalent to one previously 
seen, taking all rotational and mirror symmetries into account. Yet the execution of 
the actual code always works with the full detail on the actual board configuration. 

Table 1 . TicTacToe Verification. 



Version 


States 


Depth 


Wins 


Pure SPIN Model 


5510 


40 


942 


Model with Embedded C Code 


5510 


31 


942 


Model with Embedded C Code and Data Abstraction 


771 


31 


135 


Minimum Required for Solution 


765 


9 


135 



The number of reachable states is with this data abstraction reduced to 771 states, 
with 135 of these state declared as winning positions, as shown in Table 1. Since the 
actual number of uniquely different board configurations is 765, SPIN encounters just 
six cases here where it revisits an old configuration with different values for the addi- 
tional state variables x, y, or z. 

It should be noted that we could also have used the c_track mechanism to introduce 
a new state variable that captures the non-abstracted board value (i.e., without taking 
into account the equivalence of rotations and mirror reflections of the game board). 
By storing and tracking the board value as a single 4-byte integer, rather than as the 
original 9-byte array of marks, we then define an application specific data compres- 
sion on part of the state information, and similarly benefit by achieving a reduction of 
the memory requirements. This means that our new c_track mechanism can not just 
exploit user-defined abstractions, but also user-defined lossless or lossy data compres- 
sion methods. Of course, for verification accuracy we will normally want to restrict 
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c_dec 1 { 

extern short board [3] [3]; 

extern short play(int, int, int) ; 

extern short abstract; /* board value */ 

} ; 

c_track " &board [ 0 ] [ 0 ] " " sizeof (board) " "UnMatched" ; 
c_track "Sabstract" "sizeof (short) " ; /* matched */ 

byte x, y, z, won; 

init { 
do 

: : atomic { !won -> 

if /* all valid moves */ 

:: c_expr { (! board [ 0 ][ 0 ] ) } -> x = 0; y = 0 

:: c_expr { (! board [ 0 ] [1] ) } -> x = 0; y = 1 

:: c_expr { ( ! board [ 0 ] [2 ] ) } -> x = 0; y = 2 

:: c_expr { (! board [ 1] [ 0 ] ) } -> x = 1; y = 0 

:: c_expr { (! board [1] [1] ) } -> x = 1; y = 1 

:: c_expr { ( ! board [1 ] [2 ] ) } -> x = 1; y = 2 

:: c_expr { ( ! board [2 ] [ 0 ] ) }->x=2;y=0 

: : c_expr { (!board[2][ll) } -> x = 2; y = 1 

:: c_expr { ( ! board [2 ] [2 ] ) } -> x = 2; y = 2 

: : else -> break /* a draw */ 
fi; 

c_code { 

switch (play(now.x, now.y, now.z+1) ) { 

default: printf (" cannot happenO); break; 
case 1: now.z = 1 - now.z; break; 
case 2: now. won = 1; break; /* force a stop */ 

} 

now.x = now.y = 0; /* reset */ 

} 

} 

od 

} 

Fig. 6. Tic Tac Toe, With Data Abstraction, 
the use to sound abstractions and lossless data compression. 

3.2. Soundness 

To see that the abstractions we have used in this example are logically sound, we 
show that they satisfies the conditions (1) and (2) from Section 2.3, as required by the 
theorem. 

For given board configurations bO and b I , the relation bO~bl is defined to hold when 
bO and b\ evaluate to the same abstract board value (i.e., the configurations are equiv- 
alent upto rotations and reflections). It is easy to check that the state predicate P, 
where P(b) denotes that b is a winning configuration, satisfies condition (2), since a 
win is invariant under rotations and reflections. 

To see that the abstraction relation - satisfies condition (1), note that each transition in 
the model is either (i) a move by player 1, (ii) a move by player 2, or (iii) a win. We 
must then show that, for any two equivalent board configurations bO and cO, if there is 
a transition from bO to b I . then there is also a transition from cO to cl such that 
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cl~bl. 

Consider first a transition of type (i), in which player 1 places a mark at position (i, j). 
Suppose that c 0 is a reflection of bO across the vertical axis. Then, it is easy to see 
that the transition that places the same mark at position (i, 2 - j) must be enabled in 
cO, and results in a state cl that is equivalent to bl. A similar argument holds for the 
other ways in which cO and hi) may be equivalent, and for transitions of type (ii). 

For transitions of type (iii), it suffices to note that function play returns the same 
value for all equivalent board configurations. 

3.3. A Larger Application 

For a larger application we will discuss the verification of one of the modules from 
the flight software for JPL’s Mars Exploration Rovers (MER). 

The MER software contains 1 1 threads of execution. Each thread serves one specific 
application, such as imaging, controlling the robot arm, communicating with earth, 
and driving. There are 15 shared resources on the rover, to which access must be con- 
trolled by an arbiter, which is the target of our verification. The arbiter module pre- 
vents potential conflicts between resource requests, and enforces priorities. For 
instance, it would not make sense to start a communication session with earth while 
the rover is driving. The policy in this case is that communication is more important 
than driving, so when a request for communication is received while the rover is driv- 
ing, the arbiter will make sure that the permission to use the drive motors is rescinded 
in favor of a new permission to use the rover’s antennas. 
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Fig. 7. Sample Communication Scenario. 

Figure 7 shows a generic scenario for communication between user threads and the 
arbiter. In the scenario shown, user UO requests access to a resource, which is granted 
by the arbiter. Next, a different user ui makes a resource request that conflicts with 
the first one, but has precedence. The arbiter now sends a Rescind message to the 
first user, and waits for the confirmation that the resource is no longer used. Then the 
arbiter sends a Grant message to user ui. A new request from the first user while the 
second user still retains possession of the resource is now summarily denied by the 
arbiter. Eventually, the use of every resource must be completed by sending a Can- 
cel message to the arbiter, which notifies the arbiter that the resource is now avail- 
able for other users. 
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The arbiter module consists of about 3,000 lines of source code, written in ANSI stan- 
dard C. The arbiter also makes use of a lookup table that records which combinations 
of resource requests conflict, and what the various priorities are. 

With 11 users competing for 15 resources, the complexity of a full-scale exhaustive 
verification quickly becomes intractable. With the bitstate (supertrace) search mode, 
SPIN can randomly prune these large search spaces, within limits set by the size of 
available physical memory on the machine that is used. For exhaustive coverage a 
different strategy must be followed. One such strategy is divide and conquer. By 
breaking the problem down into smaller subproblems that can be checked exhaus- 
tively we can build confidence that the larger problem has the desired properties 
(although in this case we cannot conclusively prove it). 

As an example, we will look at a subproblem with 3 user processes, competing for 
access to just 3 of the available resources, cyclically, and in random order. Without 
the use of abstraction, a problem of this size is at the edge of what can be verified 
exhaustively with roughly 1 Gbyte of available memory. An experiment like this can 
be repeated for different subproblems by making different selections of the 3 
resources competed for, slowly increasing the confidence in the correctness of the 
complete solution. 



Table 2. MER Arbiter Verification. 



Version 


States 


Time(s) 


Mem(Mb) 


Pure SPIN Model 


272,068 


2.2 


41.8 


Model with Embedded Code 


11,453,800 


1458.0 


701.7 


Model with Embedded Code and Abstraction 


261,543 


5.1 


55.2 



Table 2 records the number of reachable states for three different versions of a verifi- 
cation model for the arbiter. The first version is a hand-built pure SPIN model of the 
arbiter algorithm, including only the lookup table from the original code as embedded 
C code. This model counts 245 lines of PROMELA code, plus 77 lines for the arbiter 
lookup table. 

The second version uses the original arbiter C code, stubbed to isolate it from the rest 
of the flight code, with all relevant data objects that contain state information tracked 
and matched, using c_track statements, without abstraction. There is approximately 
4,400 bytes of state information that is tracked in this way. This information includes, 
for instance, a linked list of current and pending reservations, and a freelist of reserva- 
tion slots. A hand-built test-harness of just 110 lines of PROMELA surrounds the 
arbiter code (as it did for the much simpler tictactoe example we discussed before), 
simulating the actions of the 3 selected user processes. The verification for this ver- 
sion of the model could only be completed with the hashcompact state compression 
option enabled, which effectively reduced memory use to about 127 bytes per state. 
More effective than this compression option, though, is to use the data abstraction 
method we discussed. In the third version of the verification model we added these 
abstractions, restricting the state information that is recorded to its bare essence. In 
this version we exclude, for instance, the values of pointers that are used to build the 
linked lists. 

Each of the three versions can find counter-examples to logic properties that have 
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known violations (including both safety and liveness properties). The first model, 
though efficient, can leave doubt about the accuracy of the modeling effort. The sec- 
ond model incurs the penalty of an implementation level verification, carrying along 
much more data than is necessary to prove the required properties. The third version 
of the model restores the relatively low complexity of a hand-built model, but has the 
benefit of precision and strict adherence to the implementation level code. 

4. Related Work 

There have been several different approaches to the direct verification of implementa- 
tion level C code, following the early work on automated extraction of verification 
models from implementation level code as detailed in for instance [3,7,8], 

Perhaps best known is the work on Verisoft [6], which is based on the use of partial 
order reduction theory in what is otherwise a state-less search. In this approach, 
application level code is instrumented in such a way that its execution can be con- 
trolled at specific points in the code, e.g., at points where message passing operations 
occur or where scheduling decisions are made. The search along a given path of 
execution is stopped when a user-defined depth is reached, and then restarted from the 
predefined initial system state to explore alternative paths. The advantage of this 
approach is that can consume considerably less memory than the traditional state 
space exploration methods used in logic model checking. It can therefore handle 
large applications. A relatively small disadvantage is that the method requires code 
instrumentation. A more significant disadvantage, compared to the method we have 
introduced in this paper, is that since no state space is maintained, none of the advan- 
tages from state space storage are available, such as systematic depth-first search, the 
verification of not only safety but also liveness properties, and the opportunity to 
define systematic abstractions. Abstraction in particular can provide significant per- 
formance gains. We believe that the methodology introduced in this paper is the first 
to successfully combine data abstraction techniques and unrestricted logic model 
checking capabilities with the verification of implementation level code. 

A second approach that is comparable to our own is the work on the CMC tool [12]. 
In this tool, an attempt is made to capture as much state information as possible and to 
store it in a state space using agressive compression techniques, similar to the hash- 
compact and bistate hashing methods used in Spin [9]. Detailed state information is 
kept on the depth-first search stack, but no methodology available to distinguish 
between state information that should be matched, and state information that is only 
required to maintain data integrity. Potentially this method, though, could be 
extended with the type of data abstraction techniques we have described in this paper. 
We believe that effective use of data abstraction techniques will prove to be the key to 
the successful application of logic model checking techniques in practice. 

5. Conclusions 

The method we have described for verifying reactive software applications combines 
data abstraction with implementation level verification. The user provides a test-har- 
ness, written in the language of the model checker, that non-deterministically selects 
inputs for the application that can drive it through all its relevant states. Correctness 
properties can be verified in the usual way: by making logical statements about reach- 
able and unreachable states, or about feasible or infeasible executions. The state 
tracking capability allows us to perform full temporal logic verification on 
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implementation level code. 

The basic method of maintaining both concrete and abstract representations of states 
in the search procedure is very similar to algorithms that have been proposed earlier 
for using (predefined) symmetry reductions in model checkers, e.g. in [1,4], 

The method is the easiest to apply in the verification of single-threaded code, with 
well-defined input and output streams. The two examples we discussed in this paper 
are both of that type. The method is not restricted to such applications though. Multi- 
threaded code can be handled, but requires more care. Each thread in the application 
will need to be prepared to run as standalone threads, with clearly defined inputs and 
outputs. The user must now identify the portions of program code that can be run as 
atomic blocks, setting the appropriate level of interleaving for the model checking 
runs. The test-harness that the user prepares now drives the thread executions directly, 
selecting the proper level of granularity of execution. The application we studied in 
[5] is of this type, and could be adapted to use the new method of data abstraction we 
have discussed here. 

The capability to redefine how state information is to be represented, or abstracted, is 
also similar to the view function in TLC [11]. In our case, the abstraction can be 
defined in any way that C or C++ allows, but it is restricted to the representation of 
external data objects. Most of the data in a SPIN model could be treated as such (e.g., 
by declaring them to be hidden within the SPIN model itself), with the exception 
only of the program counters of active processes. 

In the setup we have described, each call on the application level code is assumed to 
execute to completion without interleaving of other actions (i.e., atomically). This is 
the most convenient way to proceed, but we are not restricted to it. By using a model 
extractor, such as FeaVer or MODEX [10,15], we can convert selected functions in the 
application into PROMELA models with embedded C code, optionally using addi- 
tional source level abstraction functions, and generate a finer-grained model of execu- 
tion. The instrumentation required for model extraction can, however, complicate the 
verification process, and require deeper knowledge of the application. 
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APPENDIX 



#def ine MAXVAL 19682 /* 3 x 3~8 - 1 */ 

tdefine B(a,b,c,d) boardla] [b] *r [c] [d] 

int abstract; 

int rO [3 ] [3] = { 

{ 1, 3, 9, }, 

{ 27, 81, 243, }, 

{ 729, 2187, 6561, }, 

} ; 

int rl [3 ] [3] = { 

{ 729, 27, 1, }, 

{ 2187, 81, 3, }, 

{ 6561, 243, 9, }, 

} ; 

int r2 [3] [3] = { 



{ 


6561, 


2187, 


729, 


} 


{ 


243, 


81, 


27, 


} 


{ 


9, 


3, 


1, 


} 



} ; 



int r3 [3] [3] 


= { 






{ 9, 


243, 


6561, 


} 


{ 3, 


81, 


2187, 


} 


{ 1, 


27, 


729, 


} 



} ; 



int 

comp_row(int r[3] [3] , int L, int R, int T, int B) 

{ 

return B(T,L,0,0) + B(T, 1,0,1) + B(T,R,0,2) + 
B { 1 , L , 1 , 0 ) + B(l, 1,1,1) + B { 1 , R, 1 , 2 ) + 
B ( B , L , 2 , 0 ) + B(B, 1,2,1) + B(B,R,2,2); 



void 

min_row ( int r [ 3 ] [ 3 ] ) 

{ int v; 

v = comp_row(r , 0 , 2 , 0 , 2 ) ; if (v < abstract) abstract = v; 

v = comp_row(r , 2 , 0 , 0 , 2 ) ; if (v < abstract) abstract = v; 

v = comp_row(r , 0 , 2 , 2 , 0) ; if (v < abstract) abstract = v; 

v = comp_row(r , 2 , 0 , 2 , 0) ; if (v < abstract) abstract = v; 



void 

board_value (void) 

{ 

abstract = 2*MAXVAL; 
min_row(r0) ; 
min_row(rl) ; 
min_row ( r2 ) ; 
min_row ( r3 ) ; 
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Abstract. We propose an algorithm to find a counterexample to some 
property in a finite state program. This algorithm is derived from SPIN’S 
one, but it finds a counterexample faster than SPIN does. In particular it 
still works in linear time. Compared with SPIN’S algorithm, it requires 
only one additional bit per state stored. We further propose another 
algorithm to compute a counterexample of minimal size. Again, this al- 
gorithm does not use more memory than SPIN does to approximate a 
minimal counterexample. The cost to find a counterexample of minimal 
size is that one has to revisit more states than SPIN. We provide an 
implementation and discuss experimental results. 



1 Introduction 

Model-checking is used to prove the correctness of properties of hardware and 
software systems. When the model is incorrect, locating errors is important to 
provide hints on how to correct either the system or the property to be checked. 
Model checkers usually exhibit counterexamples, that is, faulty execution traces 
of the system. The simpler the counterexample is, the easier it will be to locate, 
understand and fix the error. A counterexample can mean that the abstraction 
of the system (formalized as the model) is too coarse; several techniques can be 
used to refine the model, guided by the counterexample found by the model- 
checker [3,1,7]. The refinement stage is done manually or automatically. In any 
case, it is important to compute small counterexamples (ideally of minimal size) 
in case the property is not satisfied: they are easier to understand, they can be 
processed more rapidly by automatic tools, and thus they make it possible to 
correct underlying errors more easily. 

It is well-known that verifying whether a finite state system A4 satisfies 
an LTL property <p is equivalent to testing whether a Biichi automaton A = 
Am has no accepting run [11], where Am is a Kripke structure describing 

the system and A-,^ is a Biichi automaton describing executions that violate 
< p . It is easy, in theory, to determine whether a Biichi automaton has at least 
one accepting run. Since there is only a finite number of accepting states, this 
problem is equivalent to finding a reachable accepting state and a loop around 
it. A counterexample to <p in A4 can then be given as a path p = P1P2 in the 
Biichi automaton, where p\ is a simple (loop-free) path from the initial state 
to an accepting state, and P 2 is a simple loop around this accepting state (see 
Figure 1). The model-checker SPIN[9,8] can find counterexamples by exploring 
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Fig. 1. An accepting path in a Biichi automaton 



on the fly the synchronized product of the system and the property. Our goal is 
to find short counterexamples while sparing memory. The first trivial remark is 
that we can reduce the length of a counterexample if we do not insist on the fact 
that the loop starts from an accepting state. Hence, we consider counterexamples 
of the form p = P 1 P 2 P 3 where p±p 2 is a path from the initial state to an accepting 
state, and p%p 2 is a simple loop around this accepting state (see Figure 2). A 
minimal counterexample can then be defined as a path of this form, such that 
the length of p is minimal. 




Fig. 2. An accepting path in a Biichi automaton 



Finding a counterexample, even of minimal size, can of course be done in poly- 
nomial time using minimal paths algorithms based on breadth-first traversals. 
However, breadth-first traversals are not well-suited to detect loops. Moreover, 
the model of the system frequently comes from several components working con- 
currently, and the resulting Biichi automaton can be huge. Therefore, memory 
is a critical resource and, for instance, we cannot afford to store the minimal 
distance between all pairs of states. Therefore, we retain SPIN’S approach and 
we use a depth-first search-like algorithm [10,5]. Depth-first traversals are well 
suited to detect loops, but they are not adapted for computing distances between 
states, which makes the problem more difficult than it first appears. 

With this approach, there are actually two difficulties: the first one is to 
find one counterexample, the second one is to find a small counterexample, and 
ideally a minimal one. 

SPIN has an option to reduce the size of counterexamples it finds. Yet, it 
does not provide the smallest one and results frequently remain too large and 
difficult to read, even when considering simple systems. For instance, on a nat- 
ural liveness property on Dekkcr’s mutual exclusion algorithm, SPIN provides a 
counterexample with 173 transitions. In this case, it is not difficult to see that 
an error occurs after 23 steps. The reason is that SPIN’S algorithm for reducing 
the size of counterexamples misses lots of them and therefore fails to find the 
shortest one. Our contribution is the following: 

— We propose an algorithm to find a counterexample of a Promela model in 
linear time. This algorithm is derived from SPIN’S, but finds a counterex- 
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ample faster than SPIN does. Moreover, compared with SPIN’S algorithm, 
it only requires one additional bit per state stored. 

— We propose another algorithm to compute a counterexample of minimal 
size, once a first counterexample has been found. This algorithm does not 
use more memory than SPIN does with option -i when trying to reduce the 
size of counterexamples. The cost of finding the shortest counterexample is 
to revisit more states than SPIN does. However, the algorithm can actually 
output a sequence of counterexamples of decreasing length found during its 
execution and can be stopped at any time. The algorithm is also well suited 
for bounded-model checking: given a maximal size of the counterexamples 
to be found, it returns one of the smallest such counterexamples, if any. 

— We have implemented a version of the last algorithm whose results are indeed 
much smaller than those given by SPIN. For instance, for Dekker’s algorithm, 
it actually finds the 23 states counterexample. 

— We finally propose other improvements to SPIN’S algorithm. 

The paper is organized as follows. In Section 2, we describe the algorithm to 
find a first counterexample and we prove its correctness. However, there is no 
guarantee that this counterexample is of minimal size. In Section 3, we present an 
algorithm finding a minimal counterexample. While explaining these algorithms, 
we exhibit various problems that may arise when computing a counterexample 
with the current SPIN algorithm. An implementation and experimental results 
are described in Section 4. 



2 Finding the First Counterexample 

Let A = (S,E,si,F) be a Biiclri automaton where S' is a finite set of states, 
E C S x S is the transition relation, Si £ S is the initial state and F C S is 
the set of accepting states. Usually transitions are labeled with actions but since 
these labels are irrelevant for the emptiness problem, they are ignored in this 
paper. In pictures, the initial state is marked with an ingoing edge and accepting 
states are doubly circled. If a state has k outgoing transitions, we number them 
from 1 to k. Transitions from a state will be considered by the algorithms in the 
order given by their labels. 

A path in an automaton is a sequence of states 7 = tit 2 ■ ■ • tfc (also denoted 
ti, f 2 , • • ■ , tk) such that for all i < k there is a transition from t* to U + \. We call 
k the length of 7, and we denote it by |7|. The empty path, with no state, is 
denoted by e and it has length 0. We say that 7 is simple if ti ^ tj for all i ^ j. 

A loop is a path tU 2 m "tk with tk = t\. A loop is accepting if it contains an 
accepting state. A loop tii 2 • • • tk is a cycle if tif 2 • • • tk- 1 is a simple path. 

An accepting path, or counterexample, is of the form 7 = si ■ ■ ■ Sfe ■ ■ ■ Sk+t 
where s\ - • • Sk is a path starting from the initial state and Sk ■ ■ ■ Sk+e is an 
accepting loop. Abusing the language, we say that 7 is a simple accepting path 
if in addition Si • ■ ■ s*, • ■ ■ Sk+ 1-1 is simple. 

In this section, we describe an algorithm finding the first counterexample. It 
is similar to the nested DFS described in [9,5,2,10], with an improvement that 
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avoids revisiting some states unnecessarily. This improvement is also useful when 
minimizing the size of the counterexample. 

Algorithm 1 uses 4 colors to mark states: white < blue < red < black. (We 
also mark states in grey, but this is just for simplifying the proof.) The color of a 
state can only increase. At the beginning, all states are white and the algorithm 
DFS_blue is called on the initial state si. 

Two DFSs alternate, the blue and red ones. The blue DFS is used to locate 
reachable accepting states and to start red DFSs from these accepting states in 
postfix order with respect to the covering tree defined by the blue DFS. A red 
DFS starts (and interrupts the blue one) whenever one pops an accepting state 
in the blue DFS. A red DFS only visits blue states, that is states already visited 
by the blue DFS. We will show that if a red DFS initiated from an accepting 
state r terminates without finding a counterexample then no state reachable 
from r may be part of an accepting path. Hence, the color of all states reachable 
from r may be set to black. This is the purpose of the black DFS. 

The DFSs used define, at any time, a current path from the initial state to 
the current state. For convenience, this current path is stored in a global variable 
cp. Actually, this is not necessary with our recursive presentation, since it may 
be obtained as a by-product of the execution stack when the counterexample is 
found. (For efficiency, SPIN uses an iterative implementation of the DFS, and 
stores the current path in a global variable.) 

Each state s G S' is represented by a structure and the algorithm requires the 
following additional fields. The extra cost of these data is only 3 bits for each 
state, while the nested DFS implemented in SPIN only needs 2 bits per state. 

— Color color initially white. 

— Boolean is_in_cp initially false. This flag is used to test in constant time 
whether a state belongs to the current path. 

When we write for all t £ E(s ) in the algorithms (see e.g. Algorithm 1), 
we assume that the successors {t G S | ( s , t) £ E} of s are returned in a fixed 
order, which is in particular the same in DFS_blue and DFS_red. This fact is 
important for the correctness of Algorithm 1. We establish simultaneously the 
following invariants. 

Lemma 1. (1) Invariant for DFSJblue: no black state is part of a simple ac- 
cepting path and all states reachable from a black state are also black. 

(2) Invariant for DFS-red, initiated from DFS-red(r) with r £ F: either no state 
reachable from r is part of a simple accepting path , or there is a simple accepting 
path going through r and using no black or grey state. 

Proof. (1) During DFS_blue(s), if we execute line 8 then all successors of s 
are black and the result is clear by induction. Now, assume that we execute 
line 11. Then DFS_red(s) was executed completely and the color of s is grey. 
Using (2) (with r = s) we deduce that no state reachable from s is part of a 
simple accepting path. Hence, after executing DFS_black(s), the invariant is still 
satisfied. 
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Algorithm 1 A version of the nested DFS algorithm: the color-DFS 

void DFS.blue (State s ) 

1: push(cp, s) ; s— >is_in_cp := true ; s — »color := blue 

2: for all t € E(s) do 

3: if (/^>is_in_cp and i € f) then exit with cp • / as counterexample 

4: else if (£—>color — white) then DFS_blue(t) end if 

5 : end f or 

6: pop(cp) ; a— *is_in_cp := f alse 
7: if (Vf £ E(s), t— >color = black) then 
8: s >color := black 

9: else if (s 6 F) then 

10: DFS_red(s) 

11: DFS_black(s) 

12: end if 

void DFS.red (State s ) 

1: push(cp.s); s— »is_in_cp true; s— -color :— red 

2: for all t 6 E(s) do 

3: if (/— >is_in_cp and (( 6 F or /— >color = blue)) then 

4 : exit with cp • £ as counterexample 

5: else if ( t >color = blue) then 

6: DFS_red(f) 

7 : end if 

8: end for 

9: pop(cp) ; s— »is_in_cp := false 
10: s— ‘color := grey 
/* 

* Note that line 10 of DFS_red is not part of the actual algorithm. 

* Its purpose is simply to clarify the correctness proof. 

* Therefore there are actually only four colors as stated in the 

* description above. 

*/ 

void DFS_black (State s ) 

1 : s— *color := black 

2: for all t 6 E(s) do 

3: if (f— ‘color ^ black) then DFS_black(£) end if 

4: end for 



(2) This is the difficult part. First, note that when entering DFS_red(r) there 
are no grey states and we get property (2) directly from (1). Now, this invariant 
may only be affected by the execution of line 10 inside some DFS_red(s). When 
executing this statement, all successors of s are either black, grey, or red. Note 
that a red successor of s is necessarily on the current path between r and s 
since the states on cp(r) are still blue, where cp(r) is the current path when 
DFS_red(r) was called. 

Assume that there exists a simple accepting path a going through r and 
using no black or grey state. Note that all paths using no black state and going 
from r to an accepting state must cross cp(r) • r. This is due to the postfix order 
of the calls DFS_red(t) for t £ F. Since we can reach an accepting state, following 
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a from r, unwinding a once if necessary, we get a path (3 from r to cp(r) ■ r using 
no black or grey state. The path cp(r) • j3 is a simple accepting path using no 
black or grey state. 

If s ^ (3 then the invariant still holds after setting the color of s to grey in 
line 10. Assume now that s £ (3 and let t be the successor of s on the path (3. 
The color of t must be red. Let v be the last state of (3 whose color is red and 
write (3 = (3\vf32- Since the color of v is red, it is on the current path between 
r and s and cp(r) • r is a prefix of cp(r>) • v. Therefore, cp(v) • u /?2 is a simple 
accepting path using no grey or black states and does not contain s. Hence, the 
invariant still holds after setting the color of s to grey at line 10. □ 



Remark 1. One can prove that if a call DFS_red(r) with r £ F terminates with- 
out finding a counterexample, then all states reachable from r are black or grey. 
Therefore, at line 10 of DFS_red(s), we could set the color of s to black directly 
and remove line 11 (the call to DFSJblack) in DFSJblue. This modification is 
fine if we are only interested in finding the first counterexample. But when the 
color of some state s is set to grey, then we do not know whether s is part of a 
counterexample or not. In other words, one can deduce that a grey state cannot 
be part of a counterexample only when the initial call DFS_red(r), with r £ F, 
terminates. In order to avoid revisiting unnecessarily some states, the minimiza- 
tion algorithm presented in Section 3 can use the fact that a black state cannot 
be part of a counterexample. This is why we do not use this modification. 

Since the algorithm visits a state at most 3 times, Algorithm 1 terminates. 
Moreover, one gets as a corollary of Lemma 1 the following statement. 

Proposition 1 . If a Biichi automaton A admits a counterexample, then Algo- 
rithm 1 finds a counterexample on input A. 



2.1 Comparison with SPIN’S Algorithm 

The difference between our algorithm and SPIN’S is that SPIN does not paint 
states in black to avoid unnecessary revisits of states. More precisely, in SPIN’S 
algorithm, lines 7 to 12 of DFS_blue are replaced with 

if s £ F then r := s; DFS_red(s) endif 

where r is a global variable used to memorize the origin of the red DFS. To 
illustrate the benefit of black states, consider the automaton below. Recall that 
the transition labels indicate in which order successors are considered by the 
DFSs. With SPIN’S algorithm, the large tree is visited twice. The first visit is 
started with DFS_blue(2) and the second one with DFS_red(3). With our algo- 
rithm, when DFS_blue(2) terminates, state 2 is black. Indeed, DFS_blue is called 
recursively on each state of the tree accessible from state 2. All leaves of this 
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tree, which have no successor, are marked black at lines 7-8, and this propagates 
back to state 2. Therefore the tree will not be revisited by DFS_red(3). 




3 Finding a Minimal Counterexample 

To find a minimal counterexample, we use a depth-first search [4] which does 
not necessarily stop when it reaches a state already visited. Indeed, reaching a 
state s with a distance to the initial state Si smaller than for the previous visit 
of s may lead to a shorter counterexample. 

Therefore, in addition to the fields used in Algorithm 1, each state has an 
integer held depth, storing the smallest length of current paths on which that 
state occurred. This held remains inhnite as long as the state has not been 
visited, and it can only decrease during the algorithm. We also use an additional 
variable mce, a stack of states containing the minimal counterexample found so 
far. It is initially empty. At the end of the algorithm, it will contain a minimal 
counterexample of the whole automaton. 



3.1 SPIN’S Algorithm 

The current algorithm implemented in SPIN to find a small counterexample is a 
variation of the nested DFS algorithm [10]. It carries on the visit below a state 
either if the state is new or if it is found more quickly than during the previous 
visits. (And, before popping an accepting state, it looks for a loop from that 
state.) This algorithm cannot guarantee to find a minimal counterexample. The 
reason is that, after finding the first counterexample, SPIN backtracks whenever 
it reaches a state with a path longer than the stored distance to the initial state. 
This is due to the false intuition that using a longer path will never yield a 
shorter counterexample. There are two cases where this is not appropriate and 
the minimal counterexample is missed. The following examples illustrate these 
two cases. As before, transition labels indicate in which order they are visited. 

In the automaton of Fig. 3, the first counterexample found is S 1 S 2 S 3 S 4 S 5 S 6 S 3 . 




Fig. 3. Missing the minimal counterexample: case 1 
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After this visit, the state depths are set as follows: (si,l), (s 2 , 2 ), (S 3 , 3), 
(s4,4), (s 5 ,5), (s6,6). SPIN’S algorithm then backtracks and S 5 is reached from 
si with depth 2. Since this is smaller than the previous depth of S 5 the visit 
proceeds to S6 which is reached now at depth 3, and then to S3, reached at 
depth 4. But 4 is greater than the previous depth of S 3 and SPIN’S algorithm 
would backtrack missing the shortest counterexample which is S 1 S 5 S 6 S 3 S 4 S 5 . 

The second case is when an accepting state is on the current path. Then, 
even if no depth was reduced after finding the first counterexample, one should 
revisit already visited states. An example is shown in Fig. 4. 




Fig. 4. Missing the minimal counterexample: case 2 



The first counterexample found (during the second depth- first search from S 2 ) 
is S 1 S 2 S 3 S 4 S 1 and the state depths are (si,l), (s 2 , 2 ), (S 3 , 3), (s 4 , 2 ). Now, when 
we reach S 4 from S 2 with the current path S 1 S 2 S 4 , no depth has been reduced and 
again SPIN’S algorithm would backtrack missing the shortest counterexample 
which is S 1 S 2 S 4 S 1 . In this case, the relevant length that was reduced is the length 
from the accepting state S 2 to S 4 (from 2 to 1). Because memory is the most 
critical resource, it is not possible to store the length from all accepting states 
to each state. Therefore, we have to revisit states already visited. 

To cope with these cases, Algorithm 2 has two operating modes: a normal 
one where several criteria can make the algorithm backtrack, and a more careful 
one, where the visit can only stop when either the current path loops, or becomes 
longer than the size of the minimal counterexample found so far. In this mode, 
states may be revisited several times. If the algorithm enters in careful mode 
while pushing a state s on the current path, it remains in this mode until that 
occurrence of s is popped off the current path. 

In the example of Fig. 3, we would switch to careful mode at lines 11-12 of 
Algorithm 2 when visiting S5 for the second time, because the field S5— Hiepth. 
gets reduced. In the example of Fig. 4, we would switch to careful mode at 
lines 7-8 when visiting S 2 , an accepting state. 

The important fact is that being careful only in these two situations is suffi- 
cient to catch a minimal counterexample. 

3.2 An Algorithm Finding the Minimal Counterexample 

Algorithm 2 is again presented by a recursive procedure which tags states while 
visiting them. Its first argument is the state to be visited. Its second argument 
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is the mode, initially normal, used for the visit. When we detect that some 
counterexample might be missed in that mode, we switch to the careful mode 
by calling the procedure with careful as the second argument. The mode could 
be implemented as a global variable, which saves memory. Making it an argument 
of the procedure yields a simpler presentation of the algorithm. 

Algorithm 2 Finding a minimal counterexample 
void DFS_MIN (State s, Boolean mode) 

1: push.(cp, . 9 ) 

2: . 9 — *depth := min(length(cp), .s— »depth.) 

3: for all t S E(s) do 

4: if mce = e or (length(cp) + 1 < length(mce))) then 

5 : if t, 6 cp then 

6: if closes_accepting(t) then mce := cpl end if 

7 : else if (mode — careful or I. 6 F) then 

8: DFS_MIN ft, careful) 

9 : else if /.— > depth = 00 then 

10: DFS.MINft, mode) 

11: else if (/— >depth > length(cp) + 1) and mce / c) then 

12: DFS.MINft, careful) 

13: end if 

14: end if 

15: end for 

16: pop(cp) 



In the description of Algorithm 2, we use the following functions: 

— int length (p) returns the length of the path p {i.e., its number of states). 
Since we only use it with cp and mce as arguments, one can maintain their 
lengths in two global variables, hence we may assume that this call requires 
0(l)-time. 

— Boolean closes_accepting(£) returns true iff cp • t is an accepting path 
(assuming that cp itself is not accepting). To implement this function, one 
can use another stack of states recording, for each state s of the current 
path cp the depth in cp of the last accepting state of cp located before 
s. For instance, if the current path is [si, S2, S3, S4, S5, se] and only S2, S5 
are accepting, then this stack contains [0, 2, 2, 2, 5, 5] (where 0 means that 
there is no accepting state). The function closes_accepting can then be 
implemented: 

• in 0(l)-time if we accept to store the depth of each state on the current 
path. To check that a state closing a cycle creates an accepting cycle, 
one checks that the depth of its occurrence on the current path is smaller 
than the depth of the last accepting state on the current path. 

• in 0(n)-time otherwise, where n is the length of the current path. Never- 
theless, the additional stack still gives useful information to avoid visiting 
the current path. For instance, if s— >depth (which will be smaller than 
the depth of s in cp) is larger than the depth of the last accepting state 
on the current path, or if there is no accepting state on it, we know that 
s does not close an accepting path. 
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3.3 Correctness of the Algorithm 

To prove that Algorithm 2 is correct, we introduce the lexicographic ordering 
on paths starting from the initial state of the automaton. Recall that if a state 
has k outgoing transitions, they are labeled from 1 to k according to the order 
in which they will be processed by the algorithm. Let A : S x S — > N assigning 
to each edge its labeling. We extend A to paths starting at Si by letting A(si) = 
£ and A(si,S 2 ,... ,s n ) = A(si, S2)A(s2> S3) ■ ■ • A(s„_i, s n ). If 7 and 7' are two 
paths starting at si, we say that 7 is lexicographically smaller than 7', denoted 
7 ^lex 7 / , if A(7) is lexicographically smaller than A (7') (with the usual order 
over N). We let 7 ^i ex V iff 7 -<iex 7' or 7 = 7'. 

The first observation is that the algorithm discovers paths in increasing lex- 
icographic order. In other words, each call to DFS_MIN makes the current path 
greater in the lexicographic ordering. 

Lemma 2. Let a and (3 be the values of cp after two consecutive executions of 
line 1 of Algorithm 2. Then, a -<i ex (3- 

Proof. First observe that the test at line 5 guarantees that the current path 
cp remains simple: DFS_MIN will not be called on a state that would close the 
current path. Let a = S1S2 • • • sg. Then either no state is popped before the next 
execution of line 1, and (3 is of the form as^+i, hence a -<i ex (3- Or 1 < k < £ 
states are first popped, and the algorithm then pushes t on the current path. 
By definition of the transition labeling A, t is a successor of se-k such that 
X(s£-ki s l-k+i) < A(s^_fc,t). Hence, the new value of cp is (3 = S1S2 • • • 
and a -q ex /3. □ 



Corollary 1. Algorithm 2 halts on any input. 

Proof. There is a finite number of simple paths in a finite graph, cp takes its 
values in this finite set and each recursive call makes it greater. □ 

Since Algorithm 2 discovers an increasing sequence of paths in the lexico- 
graphic ordering, it is natural to introduce the following sequence (7i)o<i<p- Let 
S be the finite set of simple accepting paths. Recall that a simple accepting path 
is of the form as[3s with as[3 simple and sf3 fl F 0. Since the lexicographic 
ordering is total, we can define a sequence ( 7i)o<i<p as follows: 

r 7o = min 7i ex s if s ± 0 

1 7*+i = min diex(7 € s I |7| < |7i|} if {7 € S \ |y| < |7»|} ± 0 

where |y| denotes the length of 7. By construction, the last element 7 P of this se- 
quence is an accepting path of minimal length. Note that the sequence 70, • • • , 7 P 
is increasing in the lexicographic ordering and decreasing in length. For a, f3 £ S, 
we let a C (3 if a ^i ex (3 and |a| < \(3\. We shall use the following simple fact. 

Fact 1 Each 7 * is Q-minimal in S. 
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Given a path a, we let min (a) = inin{ i | a is a prefix of 7 j} and max(a) = 
max{i | a is a prefix of 7 j}. By convention, min(a) = 00 and max(a) = —00 if 
a is not a prefix of some 7 j. 

The next proposition implies in particular that Algorithm 2 is correct, since 
the last value taken by mce is precisely 7 p . It also shows what would be the 
behavior of a variant of our algorithm which outputs the successive values of 
mce. Although the time consumption of the algorithm is high, the algorithm can 
output all counterexamples of the sequence 7 $ when they are discovered, and the 
user can stop the search at any time. For instance, Dekkcr’s algorithm produces 
about 80 counterexamples. 

Proposition 2. The successive values taken by the variable mce during the ex- 
ecution of Algorithm 2 are 7 _i = e, 70 , ... , 7 P . 

Proposition 2 is a direct consequence of Proposition 3 below. Indeed, DFS_MIN 
is initially called with the parameters (si, normal) if Si ^ F and (si, careful) 
if si £ F. If there exists a counterexample, the hypotheses of Proposition 3 are 
fulfilled at the beginning of the algorithm, with s = si, 6 = e and k = 0. Hence, 
at the end of the initial call of DFS_MIN on Si, the value of mce is 7 max (si) = 7 P - 

Proposition 3. Let Ss be a strict prefix of 7 *, with k = min(<5s) < 00 . As- 
sume that at the beginning of a call DFS_MIN (s , mode) , we have cp = 6 and 
that for all prefixes Sir of Ss we had mce = 7 m in(< 5 ir)-i at the beginning of 
the call DFS_MIN (r , . Then, at the end of the call DFS_MIN(s , mode) , we have 
mce = 7 max (( 5 s) • Moreover, whenever the variable mce is updated, it is switched 
from some "fe-i to 7 ^ with t > 0 . 

The proof of this proposition in turn uses Lemma 3. 

Proof. Let T = {t £ E(s) | min(<5sf) < 00 }. Since Ss is a strict prefix of 7 m in(< 5 s)) 
we have T/0. Write T = {ti, ... , t n } with A (s, tf) < A (s, L+i) for all 1 < * < n. 
We use an induction on \^j.\ — |<5s| > 1. 

Claim. If before line 4 when considering U £ E(s) we have mce = 7 min(< 5 sq)-i 
then after line 14 of this iteration we have mce = 'y ma .x(Sst i )- 

Let t. = ti and t = min(Jst). Either l — 0 and mce = e or |cp| + 1 = |<5sf| < 
\^e\ < |mce| and the test line 4 succeeds. 

The first case is when t £ cp = Ss. Then, we have 7 ^ = Sst and t closes an 
accepting path. Therefore, mce is updated to 7 ^. For any other successor v of s 
with A(s,r) > A(s,f), we have |dsr| = | 7 ^ | = |mce|, hence the test line 4 fails. 
Therefore, the value of mce remains 7 ^ until the end of the call DFS_MIN(s,_). 
Moreover, from 7 ^ = Sst we deduce that i = n and 7 ^ = ma x(Sst n ) = max((5s) 
which proves the claim. 

The second case is when t ^ cp = Ss. All hypotheses of Lemma 3 are fulfilled, 
hence DFS_MIN(t,_) is called. When DFS_MIN(t,-) is called, Sst is a strict prefix 
of 7 ^, cp = Ss, mce = j#-i = 7 m i n (Sst)-i an d for all prefixes Sir of Ss we had 
mce = 7min(<5ir)-i a t the beginning of the call DFS_MIN . Therefore, the 
hypotheses of Proposition 3 are fulfilled and since |y^| — \Sst\ < \jk\ — |<M we 
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get by induction that mce = 7 max (,5 s t) at the end of the call DFS_MIN(f , _) . The 
claim is proved. 

Now, we show by induction on i that before line 4 when considering t, g E(s) 
we have mce = 7 m i n (<5 s t 4 )-i. 

Note that k = min(Js) = min(<5sfi). By definition of 7 no successor t of 
s with A (s,t) < X(s,ti) may be such that 6st is on a simple accepting path of 

length less than j 7*. 1 1 (with the convention |7_i| = 00). Hence, the value of mce 

remains jk-i until t\ g E(s) is considered. The property holds for i = 1. 

Assume now that the property holds for some i < n. From the claim, we get 
mce = max(Ssti) after the iteration for t, g E(s). Let q = max(i5stj). Note that 
min(£sfj_|_i) = max(<5st,;) + 1 = q + 1. By definition of 7 q +i and of the set T, no 
successor v of s with A (s,U) < A(s,u) < A(s,tj + i) may be such that 6sv is on a 
simple accepting path of length less than \^ q \. Hence, the value of mce remains 
7 g until ti+i g E(s) is considered and the property still holds for i + 1. 

Finally, before line 4 when considering t n g E(s) we have mce = 7 m in(5st„)-i- 
Using the claim, we get mce = ma x(6st„) after the iteration for t n g E(s). Note 
that max((5st„) = max(<5s). By definition of the set T, no successor v of s with 
A (s,t n ) < A (s,v) may be such that 6sv is on a simple accepting path of length 
less than |7 max (<5s) I - Hence, the value of mce remains 7 max (<5s) until the end of 
DFS_MIN(s .mode) and the proposition is proved. □ 

The proof of the next lemma uses auxiliary results (Lemmas 5 and 6 below) 
on paths that are totally independent of the algorithm. 

Lemma 3. Let Sst be a simple path with £ = min(<5st) < 00. Assume that, while 
considering t g E(s) in DFS_MIN(s ,_) , we have cp = Ss and mce = je-i and 
that for all prefixes S±r of 6s we had mce = 7 m i n (S 1 r)-i at the beginning of the 
call DFS_MIN(r Then, DFS_MIN(t,,J is called. 

Proof. We let a' = 6s. Assume first that £ = 0, so that a't is a prefix of 70. Since 
mce = 7_i = e, the test line 4 succeeds. Since a't is simple and cp = a 1 , the test 
line 5 fails. Assume that the test line 7 fails. Then, the mode of the algorithm 
is necessarily normal, and in particular there is no accepting state on cp = a' . 
Moreover, t is not accepting: a’t fl F = 0. If the test line 9 also fails, then t has 
already been visited along a simple path that we denote fi’t. By Lemma 2, we 
have (3't ^i ex a't. This situation is impossible by Lemma 6. Hence the test line 
9 must succeed and DFS_MIN(t,_) is called in this case. 

Assume now that £ ^ 0. We have |cp| + 1 = |a't| < |7^| since a't is a prefix of 
7^. Further, |y^| < |7^_i| by definition of the sequence ( 1 7* | ) i - Since mce = 7^-1, 
we deduce that the test line 4 succeeds. Since a't is simple and cp = a', the test 
line 5 fails. Assume that the test line 7 fails. We show as before that a'tnF = 0. 
Assume that the test line 9 also fails. Then, there exists a simple path /3't such 
that (3't ^ie X a't and \(3't\ = t— >depth. If \(3't\ > |7^|, then t— >-depth = \/3't\ > 
\a't\ = | cp | + 1 and DFS_MIN(f,_) is called on line 11. 

Assume now that \(3't\ < We want to apply Lemma 5 with a = 7^. 
Recall that 7^ is a simple accepting path which is C-minimal in S. Note that 
a't is a prefix of a which satisfies property (1) of Lemma 5 and it remains to 
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show that a't is minimal with this property. Since the algorithm is still in mode 
normal (the test line 7 failed), for all prefixes Sir of a' , we had r— idepth = oo 
when the call to DFS_MIN(?’, normal) was made. By hypothesis, at the beginning 
of this call, we had mce = 'Iti—i where l\ = min(<5ir). Assume that there is a 
simple path (3[r -< 4 ex Sir. Let /?" C fi[ be C-minimal with this property. Since 
(3"r was not visited before Sir, and since /3" is C-minimal, we have \P[r\ > 
\(3"r\ > l^i—i | > | 7 ^| = |a| and property (1) of Lemma 5 does not hold for Sir. 
Therefore, a't is the shortest prefix of a = 7 ^ satisfying (1) and Lemma 5 implies 
that \(3'\ > \a'\. We conclude as above that DFS_MIN(f ,_) is called on line 11. □ 

Lemma 4. Let 6 = as (3 s be a path with s/3 D F ^ 0, as simple and as (~l (3 = 0. 
We can construct a simple accepting path S' = a's'/3's' such that |<5'| < |<5| and 
as is a prefix of a' s'. 

Proof. Assume that S is not a simple accepting path. Let t be the first state 
occuring twice in (3. Then, we write (3 = Pitfatfe with t ^ / 3 i/? 2 - If s(3it(3zC\F ^ 0 
then we let S' = as(3it(3^s. The path S' still satisfies the hypotheses of the lemma 
(with the same a and s) and we have |4 V | < |<5|. Hence we can conclude by 
induction. Otherwise, we let S' = as(3it/3 2 t. Again, S' still satisfies the hypotheses 
of the lemma and |5'| < |<J|. Since as is a prefix of as/3it, we can again conclude 
by induction. □ 



Lemma 5. Let a € S be a simple accepting path which is Q-minimal in S. 
Assume that there exists a prefix a't of a satisfying 

a't fl F = 0, (3't -<i ex a't and \/3't\ < |a| for some simple path f3't. (1) 

For the shortest prefix a't of a satsisfying (1) we have \[3'\ > \a'\. 

Proof. Let a,i be the greatest common prefix of a' and [3' . We write a' = a\a 2 
and (3' = ai/? 2 - Note that the transition between the last state of ai and the 
first state of is strictly smaller than the transition between the last state of 
ai and the first state of Hence, for all nonempty prefixes f3' 2 of and a' 2 
of a 2 t, we have ai/3' 2 ^i ex aia 2 . 

Assuming by contradiction that \p 2 \ < \a 2 \ we will build a simple accepting 
path j3 with (3 ^i ex a and \/3\ < |a|, a contradiction with the C-minimality of a. 

Write a = a'ta" and let s be the first state on (3 2 t which occurs 
also on ta" . We write [3 2 t = (3' 2 sf3 2 and a" = a^sai with s ^ a 3 . 
Below, whenever we state that a path is simple, this follows from the 
definition of s and <33 and from the fact that a and (3't are simple. 




1. Assume that c^s contains a final state. Then, <5 = ai (3' 2 sf3 2 aj > s is an accepting 
path which is not necessarily simple. Yet, (3't = ai(3' 2 s(3 2 is simple. Hence ai(3' 2 s 
is simple. Moreover, a\f3' 2 ,sC\f3 2 <33 = 0 since a and (3 2 t are simple and by definition 
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of s and a. 3 . Applying Lemma 4, we obtain a simple accepting path (3 with 
|/3| < |<5| < |a| and a\p' 2 s is a prefix of /3. We deduce that /3 -<3 ex a as announced. 

Assume now that 03s flF = 0, so that cerfl F ^ 0, and let v be the last state 
of a. By definition of an accepting path, v is also the seed of the accepting loop. 

2. We first show that v does not occur in a 2. Assume by contradiction that 02 = 
afva'f and consider the simple path S = aiP' 2 sa^ = 8'v. We have 8'v ^i ex ot\o! 2 v 
and |(5'i>| < |a|, a contradiction with the fact that t is the first such state. 

3. If v does not occur in ta% then we let /3 = aifi^sa^. 

4. If v occurs in ta 3 then we write ta% = and we let /3 = ou/^scuia^s. 

In both cases, we can check that (3 is a simple accepting path and that 
/ 3 ^i ex a and |/3| < |a| as desired. □ 



Lemma 6 . If a't is a prefix of 70 with a'tC\F = 0 then there is no simple path 
f3't with (3't -<i ex a't. 

Proof. Assume by contradiction that there exists a prefix a't of 70 such that 
a'fflF = 0 and fi't ^i ex a't. for some simple path fi't. In the following, we assume 
that a't is the shortest such prefix of 70- We will build a simple accepting path 
(3 with (3 ^i ex 70, a contradiction with the definition of 70. 

We proceed exactly as in the proof of Lemma 5. The only difference is in 
case (2) when v occurs in «2- Here we let /3 = a 1 f3' 2 s a 4 a " t a 3 s . Note that we 
may have \(3\ > |7o| but we can show that /3 is a simple accepting path with 
(3 ^iex 7o> a contradiction with the A lex -minimality of 70 • □ 



3.4 Remarks on the Algorithm 

To keep the presentation simple, we have described the algorithm starting from 
a fresh input. However, one can also start from an automaton already tagged by 
Algorithm 1. Since no counterexample can go through a black state, this allows 
us to backtrack in the depth-first search as soon as a black state is seen. This 
shortens obviously the search by cutting useless parts of the automaton. 

Moreover, one can also bound the search by the size of the counterexample 
produced by Algorithm 1. More precisely, Algorithm 2 is well suited for bounded 
model-checking. One can give it a bound B for the depth of the research, and it 
would find successively the counterexamples 7^, 7^+1, ■ • ■ , 7 P where £ is the first 
index such that |t^| < B. This amounts only to changing the test of line 4 by 

(mce = e and (length(cp) + 1 < By) or (length(cp) + 1 < length(mce)) 



4 Implementation and Experimental Results 

The algorithm presented in Section 2 is quite efficient and visits each state 
at most twice (in view of Remark 1) in order to find a first counterexample. 
The second algorithm on the other hand finds the shortest counterexample at 
the expense of revisiting states much more often. In the worst case, its time 
complexity is exponential. In order to get the best of the two, we could start 




106 



P. Gastin, P. Moro, and M. Zeitoun 



with the first algorithm until a first counterexample is found (if any) and then 
switch to the second algorithm to find the shortest counterexample. 

In the prototype used to obtain the experimental results presented below, we 
actually used SPIN’S algorithm for finding the first counterexample instead of our 
algorithm presented in Section 2. Then we switch to our minimization algorithm 
of Section 3. The reason is that more in-depth changes have to be carried out on 
SPIN’s code to implement our algorithm of Section 2 and our primary goal was 
just to minimize the size of the counterexample. We are currently implementing 
the algorithm of Section 2 and since it is always more efficient than SPIN’s one, 
more improvements can be expected. 

In the synchronized product between the model and the LTL automaton 
built by SPIN, there is a strict alternation between transitions of the model and 
transitions of the LTL automaton (see [9]). Therefore all accepting paths are of 
odd length and when minimizing the size of a counterexample we can replace 
line 4 of Algorithm 2 by length(cp)+2 < length (mce). The test line 4 works 
in fact for an arbitrary Buchi automaton. This trivial optimization is important 
for our algorithm since it may revisit states quite often. 

We have conducted experiments for various algorithms and specifications. Ex- 
periments for which the model does not satisfy the specification and counterex- 
amples exist are gathered in Table 1. We compare our algorithm with SPIN -i 



Table 1. Experiments for various algorithms when a counterexample does exist 





SPIN -i 


Contrex 




counterexample length 


55 


19 


Peterson 


states stored 


80 


85 




states matched 


1968 


9469 




computation time 


0.030s 


0.070s 




counterexample length 


173 


23 


Dekker 


states stored 


539 


543 




states matched 


48593 


2.5* 10 6 




computation time 


0.240s 


11.420s 




counterexample length 


5 


5 


Dijkstra 


states stored 


211258 


209687 


(3 users) 


states matched 


1.96928e+09 


654246 




computation time 


71m27.700s 


1.780s 




counterexample length 


97 


17 


Hyman 


states stored 


123 


157 




states matched 


7389 


40913 




computation time 


0.080s 


0.210s 



which tries to reduce the size of the counterexample. Clearly SPIN -i does not 
find the shortest counterexample while we have proved in Section 3 that our 
algorithm does. The automata of the specifications (never-claims) have been 
generated by the tool LTL2BA [6] both for the verification with SPIN -i and 
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with our algorithm. For each experiment, we show, in addition to the size of the 
minimal counterexample found, the number of different states visited by the al- 
gorithms (states stored) . The last information (states matched) is the number of 
states (re)visitecl during the algorithm. Here, each time a state is (re)visitecl this 
counter is incremented. The execution time which is indicated is the user time 
(the system time is negligible in all cases) obtained on a Pentium III 700Mhz 
with 1Gb of RAM and 1Gb of cache. 

As expected, our algorithm needs in general to revisit more states that SPIN’S 
in order to really find the minimal counterexample. Yet, there are cases where 
SPIN’S algorithm is less efficient than ours. The reason is that SPIN -i does 
not test whether a counterexample already exists (test mce e, line 11 of Algo- 
rithm 2). For Dijkstra’s algorithm, there is no counterexample in the left part 
of the graph. Therefore, until the first counterexample has been found, our al- 
gorithm does not switch to careful mode, even if a state’s depth gets lowered. 



5 Conclusion and Open Problems 

The main contribution of this paper is the algorithm presented in Section 3 
which finds a shortest accepting path in a Buchi automaton. It actually finds 
a sequence of counterexamples of decreasing length which it can output. It has 
been implemented and the comparison with SPIN’S algorithm clearly demon- 
strates its superiority in counterexample length. We also proposed an algorithm 
to find a counterexample, without trying to minimize its length, which is more 
efficient than SPIN’S. It avoids unnecessary revisits of states and hence finds 
a counterexample more quickly. We plan to implement this algorithm and to 
compare it experimentally with SPIN. Further, this algorithm detects states 
that cannot be part of an accepting path (black states). Hence, using it instead 
of SPIN’s before searching for a minimal counterexample should improve the 
performance of our second algorithm. 

Finding a shortest counterexample with a depth-first search algorithm is 
time consuming because we need to revisit states many times. A general goal for 
improving the efficiency is to detect more states that need not be revisited. 

Another important issue is to be able to deal with partial order reductions. 
While the first nested-DFS algorithm [5] failed in the presence of partial order 
reductions, the version of [10] is able to cope with some reductions. We need 
to investigate whether our algorithms can handle partial order reductions with 
reasonable memory requirements. 

Finally, it would be interesting to find ways to minimize the length of the 
counterexample with respect to the model and the LTL specification. Instead, 
existing algorithms search for counterexamples for the model and a specific au- 
tomaton associated with the LTL specification. It is often the case that this 
specific automaton is not optimal for finding a short counterexample for the 
LTL formula. 
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Abstract. We propose a new framework for black-box conformance 
testing of real-time systems, where specifications are modeled as non- 
deterministic and partially-observable timed automata. We argue that 
such a model is essential for ease of modeling and expressiveness of spec- 
ifications. The conformance relation is a timed extension of the input- 
output conformance relation of [29]. We argue that it is better suited 
for testing than previously considered relations such as bisimulation, 
must /may preorder or trace inclusion. We propose algorithms to gener- 
ate two types of tests for this setting: analog-clock tests which measure 
dense time precisely and digital-clock tests which measure time with a 
periodic clock. The latter are essential for implementability, since only 
finite-precision clocks are available in practice. We report on a prototype 
tool and a small case study. 



1 Introduction 

Testing is a fundamental step in any development process. It consists in applying 
a set of experiments to a prototype system, with multiple aims, from checking 
correct functionality to measuring performance. In this paper, we are interested 
in so-called black-box conformance testing , where the aim is to check conformance 
of a system to a given specification. The system under test is “black-box” in the 
sense that we do not have a model of it, thus, can only rely on its observable 
input/output behavior. 

Formal testing frameworks have been proposed (e.g., see [10]), where spec- 
ifications are described in models with precise semantics and mathematical re- 
lations between such models define conformance. Then, under the assumption 
that the system under test (or implementation) can be modeled in the given 
framework, a set of tests can be automatically derived from the specification to 
test conformance of the (unknown) model. A number of issues arise, regarding 
the appropriateness of the models and conformance relation, the correctness of 
the testing process, its adequacy, its efficiency, and so on. Tools for test genera- 
tion have been developed for various languages and models, both untimed (e.g., 
see [17,3,13]) and timed (e.g., see [8,15,12,20,25,26,19]). 

* Work partially supported by European 1ST projects “Next TTA” under project No 
IST-2001-32111 and “RISE” under project No IST-2001-38117, and by CNRS STIC 
project “CORTOS”. 



S. Graf and L. Mounier (Eds.): SPIN 2004, LNCS 2989, pp. 109-126, 2004. 
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In this paper, we propose a new testing framework for real-time systems, 
based on timed automata [1]. Existing works based on similar models (e.g., [14, 
16,25,28,11,23,19]) present two major limitations. 

First, only restricted subclasses of timed automata are considered. This is 
problematic, since it limits the class of specifications that can be expressed. 
For example, [28,19] consider timed automata where outputs are isolated and 
urgent. The first condition states that, at any given state, the automaton can 
only output a single action. Therefore, a specification such as “ when input a 
is received, output either b or c” cannot be expressed in this model. Worse, 
the second condition states that, at any given state, if an output is possible, 
then time cannot elapse. This essentially means that outputs must be emitted 
at precise points in time. Therefore, a specification such as “ when input a is 
received, output b must be emitted within at most 10 time units ” cannot be 
expressed. Most other works consider deterministic or determinizable subclasses 
of timed automata. For instance, [25] use event-recording automata [2] and [23] 
use a determinizable timed automata model with restricted clock resets. It is 
also typically assumed that specifications are fully-observable , meaning that all 
events can be observed by the tester. 

The second limitation concerns implementability of tests. Only analog-clock 
tests are considered in the works above. These are tests which can observe the 
time of inputs precisely and can also react by emitting outputs in precise points 
in time. For example, a test like “emit output a at time 1; if at time 5 input b 
is received, announce PASS and stop, otherwise, announce FAIL ” is an analog- 
clock test. Analog-clock tests are problematic, since they are difficult, if not 
impossible, to implement with finite-precision clocks. The tester which imple- 
ments the test of the example above must be able to emit a precisely at time 
1 and check whether b occurred precisely at time 5. However, the tester will 
typically sample its inputs periodically, say, every 0.1 time units, thus, it cannot 
distinguish between b arriving anywhere in the interval (4.9, 5.1). 

In this paper, we lift the above limitations. Our main contributions are the 
following. 

First, we develop a framework which can fully handle non- deterministic and 
partially observable specifications. Such specifications arise often in practice: 
when the model is built compositionally, component interactions are typically 
non-observable to the external world; abstraction from low-level details often re- 
sults in non-determinism. In general, timed-automata cannot be determinized [1] 
and non-observable actions cannot be removed [5] . It can be argued that in prac- 
tice many models will be determinizable. However, checking this (and performing 
the determinization) is undecidable [32]. Thus, it is important to offer a mod- 
eling framework which is general enough to relief the user from the burden of 
performing determinization “manually” . 

Second, we propose a conformance relation, called timed input-output confor- 
mance or tioco, inspired from the “untimed” conformance relation ioco of [29]. 
According to ioco, A conforms to B if for each observable behavior specified in B, 
the possible outputs of A after this behavior is a subset of the possible outputs 
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of B. tioco is simply defined by including time delays in the set of observable 
outputs. This permits to capture the fact that an implementation producing an 
output too early or too late (or never, whereas it should) is non-conforming. 
A number of different conformance relations have been considered in previous 
works. [28] use bisimulation (which in that case reduces to trace equivalence, 
because of determinism). Bisimulation is also used in [12]. [25] use a must /may 
preorder. A must /may testing criterion is also considered in [20]. [19] use trace in- 
clusion. [23] use an adaptation of ioco which, under the hypotheses of the model, 
is shown to be equivalent to trace inclusion. We argue that tioco is more appro- 
priate for conformance testing than the above conformance relations, because it 
leaves more design freedom to potential implementations (see Section 3). 

Finally, we consider both analog-clock and digital-clock (or periodic-sampling) 
tests. Analog-clock tests can measure precisely the delay between two events, 
whereas digital-clock tests can only count how many “ticks” of a periodic clock 
have occurred between the two events. Digital-clock tests are clearly more real- 
istic to implement. Analog-clock tests can still be useful, however. For instance, 
when the implementation is discrete-time but its time step is not known a-priori. 

The issue of determinization arises during test generation, since most algo- 
rithms rely on an implicit determinization of the specification. This presents 
problems for analog-clock test generation, due to the fact that timed automata 
are not determinizable in general, as mentioned above. To deal with the problem, 
we follow the idea of [31]: the automaton is “determinized” on-the-fly , during 
test generation and execution. The algorithm uses standard symbolic reachability 
techniques for timed automata. With a simple modification of the specification 
model, similar techniques can be used to generate digital-clock tests. The latter 
can be generated either on-the-fly or off-line, in which case they are represented 
as finite trees. We discuss a simple heuristic to reduce the size of these trees by 
eliminating chains of ticks. We also briefly discuss coverage, proposing a heuristic 
to generate a test suite which covers the edges of the specification automaton. 

We have implemented our test-generation algorithms in a prototype tool, 
called TTG. The tool is built on top of the IF environment [7] and uses the 
modeling language of the latter. This language allows to specify systems of many 
processes communicating through message passing or shared variables and also 
includes features such as hierarchy, priorities, dynamic creation and complex 
data types. We have applied TTG to a small case study, presented in Section 6. 
We have also applied TT G to test behaviors of the K9 Martian Rover executive 
of NASA [9] . The results of TT G on this case study are reported in [4] . 

The rest of this paper is organized as follows. Section 2 reviews timed au- 
tomata and timed automata with inputs and outputs. Section 3 introduces the 
testing framework. Section 4 defines analog and digital-clock tests. Section 5 
presents the test generation methods for the two types of tests. Section 6 dis- 
cusses a prototype implementation and illustrates the method on a small case 
study. Section 7 presents the conclusions and future work plans. 
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2 Timed Automata 

Let R be the set of non-negative reals. Given a finite set of actions Act, the 
set (Act U R)* of all finite-length real-time sequences over Act will be denoted 
RT(Act). e £ RT(Act) is the empty sequence. Given Act' C Act and p £ RT(Act), 
PAct’(p) denotes the projection of p to Act', obtained by “erasing” from p all 
actions not in Act'. For example, if Act = {a, b}, Act 1 = {a} and p = al&2a3, 
then i-Act'(p) = a3a3. The time spent in a sequence p, denoted time(p) is the 
sum of all delays in p, for example, time(e) = 0 and time(al&0.5) = 1.5. 

We use timed automata [1] with deadlines to model urgency [27,6]. A timed 
automaton over Act is a tuple A = (Q, go, A, Act, E) where: Q is a finite set of 
locations; go G Q is the initial location; A is a finite set of clocks; E is a finite set 
of edges. Each edge is a tuple (■ q , q' , ip, r, d, a), where g, q' £ Q are the source and 
destination locations; ip is the guard, a conjunction of constraints of the form 
xjpc, where x G X, c is an integer constant and # G {<, <, =, >, >}; rCIis 
a set of clocks to reset to zero; d G {lazy, delayable, eager} is the deadline; and 
a G Act is the action. We will not allow eager edges with guards of the form 
x > c. 

A timed automaton A defines an infinite labeled transition system (LTS). 
Its states are pairs s = (q,v), where q G Q and v : X — > R is a clock valuation. 
0 is the valuation assigning 0 to every clock of A. Sa is the set of all states 
and sfi = (go, 0 ) is the initial state. There are two types of transitions. Discrete 
transitions of the form (q,v) — > (q',v'), where a G Act and there is an edge 
(g,g' ,ip,r, d,a), such that v satisfies ip and v 1 is obtained by resetting to zero 
all clocks in r and leaving the others unchanged. Timed transitions of the form 
(g, v ) A (q,v + t), where t G R, t > 0 and there is no edge (g, q" , ip, r, d, a), such 
that: either d = delayable and there exist 0 < t\ < t 2 < t such that v + 1\ | = ip 
and v + <2 ¥= ip; or d = eager and v \= ip. We use notation such as s A, s -?>, 
..., to denote that there exists s' such that s A s', there is no such s', and so 
on. This notation extends to timed sequences, in the usual way. A state s G Sa 
is reachable if there exists p G RT(Act) such that A s. The set of reachable 
states of A is denoted Reach (A). 

Timed Automata with Inputs and Outputs: In the rest of the paper, we assume 
given a set of actions Act, partitioned in two disjoint sets: a set of input actions 
Act; n and a set of output actions Act out . We also assume there is an unobservable 
action r ^ Act. Let Act r = Act U {r}. 

A timed automaton with inputs and outputs (TAIO) is a timed automa- 
ton over Act r . A TAIO is called observable if none of its edges is labeled 
by t. A TAIO A is called input- complete if it can accept any input at any 
state: Vs G Reach(A) . Va G Act; n . s A. It is called deterministic if \/s,s',s" G 
Reach(A) . Va G Act r ,sAs'AsAs'As' = s". It is called non-blocking if 

Vs G Reach(A) . Vf G R . 3p G RT(Act ou t U {r}) . time(p) = t A s A . (1) 

This condition guarantees that A will not block time in any environment. 
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The set of observable timed traces of a TAIO A is defined to be 

Traces(A) = {P Ac t(p) | P £ RT(Act r ) A Sq 4 }. (2) 

3 Specifications, Implementations, and Conformance 

We now describe our testing framework. We assume that the specification of the 
system to be tested is given as a non-blocking TAIO As- We assume that the 
implementation (i.e., the system to be tested) can be modeled as a non-blocking, 
input-complete TAIO A/. Notice that we do not assume that A/ is known, simply 
that it exists. Input-completeness is required so that the implementation can 
accept inputs from the tester at any state (possibly ignoring them or moving to 
an error state, in case of illegal inputs). 

In order to define the conformance relation, we define a number of operators. 
Given a TAIO A and cr e RT(Act), A after cr is the set of all states of A that can 
be reached by some timed sequence p whose projection to observable actions is 
cr. Formally: 

A after cr = (s g Sa | £ RT(Act r ) . s^ 4 s A PAct(p) = cr}. (3) 

Given state s £ Sa, elapse(s) is the set of all delays which can elapse from s 
without A making any observable action. Formally: 

elapse(s) = {t > 0 | 3p £ RT({r}) . time(p) = t A s 4 }. (4) 

Given state s £ Sa, out(s) is the set of all observable “events” (outputs or the 
passage of time) that can occur when the system is at state s. The definition 
naturally extends to a set of states S. Formally: 

out(s) = (a £ Act OJt | s 4 } U elapse(s), out(5) = [J out(s). (5) 

ses 

The timed input-output conformance relation, denoted tioco, is defined as 

A/ tioco Ag = Vcr G Traces(As) . out(A/ after cr) C out(As after cr). (6) 

Due to the fact that implementations are assumed to be input-complete, it can be 
easily shown that tioco is a transitive relation, that is, if A tioco B and B tioco C 
then A tioco C. It can be also shown that checking tioco is undecidable. This 
is not a problem for black-box testing: since A/ is unknown, we cannot check 
conformance directly, anyway. 

Examples: Before we proceed to define tests, we give some examples that illus- 
trate the meaning of our testing framework. In the examples, input actions are 
denoted a?, bl, etc, and output actions are denoted a!, &!, etc. Unless otherwise 
mentioned, deadlines of output edges are delayable and deadlines of input edges 
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Fig. 1. Examples of specifications and implementations. 



are lazy. In order not to overload the figures, we do not always draw input- 
complete automata. We assume that implementations ignore the missing inputs 
(this can be modeled by adding self-loop edges covering these inputs) . 

Consider the specification Spec 4 shown in Figure 1. SpeC]^ could be expressed 
in English as follows: “after the first a received, the system must output b no 
earlier than 2 and no later than 8 time units”. Implementations Imp^ and lmpl 2 
conform to Spec 4 . Imp^ produces b exactly 5 time units after reception of a. 
Impl 2 produces b within 4 to 5 time units. Impl 3 and lmpl 4 do not conform 
to Specp lmpl 3 may produce a b after 1 time unit, which is too early. Impl 4 
fails to produce a 6 at all. Formally, out(lmpl 3 after al) = (0,4] U {6} and 
out(lmpl 4 after al) = (0,oo), whereas out(Spec 1 after al) = (0,7]. 

Now consider specification Spec 2 shown in Figure 2. This specification could 
be written down as: “if the first input is a then the system should output b within 
10 time units; if the first input is c then the system should either output d within 
5 time units or, failing to do that, output e within 7 time units”. The second 
branch of Spec 2 is a typical specification of a timeout. If the “normal” result 
d does not appear for some time, the system itself should recognize the error 
and output an error message not much later. None of the four implementations 
of Figure 1 conform to Spec 2 , as they do not react to input c (they ignore it). 
On the other hand, lmpl 5 and lmpl 6 of Figure 2 are conforming. It is worth 
noticing that lmpl 6 may output a b some time after receiving input /. The fact 
that input / does not appear in Spec 2 does not affect the conformance of lmpl 6 . 
(In fact, lmpl 5 and lmpl 6 conform not only to Spec 2 but also to Spec 4 .) This 
example illustrates another property of tioco, namely, that an implementation is 
free to accept inputs not mentioned in the specification and behave as it wishes 
afterwards. This property is essential for capturing assumptions on the inputs 
(i.e., on the environment) in the specification. This is why we do not require 
specifications to be input-complete. 



Comparison: [28] define conformance as timed bisimulation (TB), which in their 
case reduces to timed trace equivalence (TTE), since determinism is assumed. 
[25] define conformance using a must /may preorder (MMP). None of Imp^, lmpl 2 
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Fig. 2. More examples of specifications and implementations. 



conform to SpeCj w.r.t. TB, TTE or MMP. We believe that this is too strict. 1 
[23,19] define conformance as timed trace inclusion (TTI). TTI is generally 
stricter than tioco: tioco allows an implementation to accept inputs not accepted 
by the specification, whereas TTI does not. When the specification is input- 
complete, tioco and TTI are equivalent. A deterministic (and fully observable) 
specification can be made input-complete without changing its conformance se- 
mantics by adding edges covering the missing inputs and leading to a “don’t care” 
location where all inputs and outputs are accepted. This transformation is not 
always possible for non-deterministic specifications. Moreover, if the transforma- 
tion is performed, care must be taken to instruct the test generation algorithm 
not to explore the “don’t care” location, so that it does not generate useless 
tests. We opt for tioco, which avoids these complications in a simple way. For an 
extensive discussion of various untimed conformance relations, see [30]. 

4 Tests 

A test (or test case) is an experiment performed on the implementation by an 
agent (the tester). There are different types of tests, depending on the capabilities 
of the tester to observe and react to events. Here, we consider two types of 
tests (the terminology is borrowed from [18]). Analog-clock tests can measure 
precisely the delay between two observed actions and can emit an input 2 at 
any point in time. Digital-clock (or periodic-sampling) tests can only count how 
many “ticks” of a periodic clock have occurred between two actions and emit an 
input immediately after observing an action or tick. For simplicity, we assume 
that the tester and the implementation are started precisely at the same time. 

1 It should be noted, however, that the issue does not arise in [28] because outputs 
are assumed to be urgent, thus, Spec x cannot be expressed. 

2 We always use terms “input” and “output” to mean input / output of the implemen- 
tation. Thus, we write “the test emits an input” rather than “emits an output” . We 
follow the same convention when drawing test automata. For example, the edge la- 
beled a ? in the TAIO of Figure 3 corresponds to the tester emitting a, upon execution 
of the test. 




116 M. Krichen and S. Tripakis 



In practice, this can be achieved by having the tester issuing the start command 
to the implementation. 

It should be noted that we consider adaptive tests (following the terminology 
of [24]), where the action the tester takes depends on the observation history. 
Adaptive tests can be seen as trees representing the strategy of the tester in a 
game against the implementation. Due to restrictions in the specification model, 
which essentially remove non-determinism from the implementation strategy, 
some existing methods [28,19] generate non-adaptive test sequences. 

4.1 Analog-Clock Tests 

An analog-clock test for a specification As over Act r is a total function 

T : RT(Act) — » Actj n U {_L, pass, fail}. (7) 

T(p) specifies the action the tester must take once it observes p. If T(p) = a £ 
Act; n then the tester emits input a. If T(p) = _L then the tester waits (lets time 
elapse). If T(p) £ {pass, fail} then the tester produces a verdict (and stops). To 
represent a valid test, T must satisfy a number of conditions: 

3t £ R . Vp £ RT(Act) . time(p) > t => T(p) £ {pass, fail} (8) 

Vp £ RT(Act) . T(p) £ {pass, fail} =► Vp' £ RT(Act) . T(p • p) = T(p) (9) 

Condition (8) states that the test reaches a verdict in bounded time t (called 
the completion time of the test). Condition (9) is a “suffix-closure” property 
ensuring that the test does not recall a verdict. We also need to ensure that the 
test does not block time, for instance, by emitting an infinite number of inputs 
in a bounded amount of time. This can be done by specifying certain conditions 
on the LTS defined by T. The states of this LTS are sequences p £ RT(Act). 
The initial state is e. For every a £ Act out there is a transition p A p • a. There 
is also a transition p A- p • t for every t £ R, provided Vt' < t.T(p) = _L. If 
T(p) = ft £ Actj n then there is a transition p \ p-b. As a convention, all states p 
such that T(p) = pass are “collapsed” into a single sink state pass, and similarly 
with fail. We require that states of this LTS are non-blocking as in Condition (1), 
unless pass or fail is reached. 
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Fig. 3. Analog-clock test represented as a TAIO or a function. 
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Analog-clock tests can sometimes be represented as TAIO. 3 For example, 
the test defined in the right part of Figure 3 can be equivalently represented by 
the TAIO shown in the left part. Function T is partially defined in the figure. 
The remaining cases are covered by the suffix-closure property of pass/fail — 
Condition (9). For instance, T(a?9 6!) = fail, because T(a? 9) = fail. 

Execution of the test T on the implementation A/ can be defined as the paral- 
lel composition of the LTSs defined by T and A/, with the usual synchronization 
rules for transitions carrying the same label. We will denote the product LTS by 
Aj\\T. The execution of the test reaches a pass/fail verdict after bounded time. 
However, since the implementation can be non-deterministic or non-observable, 
the verdict need not be the same in all experiments (i.e., runs of the product). 
To declare that the implementation passes the test, we require that all possible 
experiments lead to a pass verdict. This implies that in order to gain confi- 
dence in pass verdicts, the same test must be executed multiple times, unless 
the implementation is known to be deterministic. 

Formally, we say that Aj passes the test, denoted A/ passes T, if state fail is 
not reachable in the product A/||T. We say that an implementation passes (resp. 
fails) a set of tests (or test suite ) T if it passes all tests (resp. fails at least one 
test) in T. We say that T is sound with respect to Ag if VAj . A/ tioco Ag => 
A/ passes T. We say that T is complete with respect to Ag if VAj . Aj passes T 
A/ tioco Ag. 

Soundness is a minimal correctness requirement. Is is rather weak, since many 
tests can be sound and useless (by always announcing pass). Completeness, on 
the other hand, is usually impossible to achieve with a finite test suite (see 
Section 5.3). We are thus motivated to define another notion. We say that a test 
T is strict with respect to Ag if VA/ . A/ passes T => Ai\\T tioco Ag. What 
the above definition says is that a strict test must not announce pass when the 
implementation has behaved in a non-conforming manner during the execution 
of the test. In the untimed setting, a similar notion of lax tests is proposed in [22] . 
The test shown in Figure 3 is sound and strict w.r.t. Specj of Figure 1. Changing 
the fail state of the test into pass would yield a test which is still sound, but no 
longer strict. 

4.2 Digital-Clock Tests 

Consider a specification Ag over Act T and let tick be a new output action, not 
in Act r . A digital-clock test (or periodic sampling test) for Ag is a total function 

D : (Act U {tick})* — > Act; n U {_L, pass, fail}. (10) 

The digital-clock test can observe all input and output actions, plus the action 
tick which is assumed to be the output of the tester’s digital clock. We assume 

3 But not always: the test which moves to pass once it observes a sequence of a’s 
such that the time distance between two a’s is 1 cannot be captured by a timed 
automaton with a bounded number of clocks. This is related to the fact that timed 
automata are not determinizable whereas a test is by definition deterministic. 
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that the initial phase of the clock is 0 and its period is 1. We further assume that 
the clock is never reset, and that ticks have priority over other observable actions 
(i.e., if tick and a occur at the same time, tick will be always observed before 
a). With these assumptions, if action a is observed after the t-tlr and before the 
(i+ l)-st tick, then the tester knows that a occurred at some time in the interval 
[n, n + 1). 

Validity conditions similar to those for analog-clock apply to digital-clock 
tests as well. Due to lack of space, we omit the formal definitions. A digital-clock 
test D defines a LTS with states in (Act U {tick})* and labels in Act U {tick} U R. 
Given state n, if D(tt) ^ Act; n then n has a self- loop transition labeled with f, for 
all t £ R. The reason such transitions are missing from states such that D(tt) = 
a £ Actj n is that we assume that the digital-clock test emits a immediately after 
the last event in 7r is observed. 

Execution of a digital-clock test is defined by forming 
the parallel product of three LTSs, namely, the ones of 
the test D , the implementation Aj, and the Tick automa- 
ton shown to the left. Tick implicitly synchronizes with 
Aj through time. Tick explicitly synchronizes with D on 
transitions labeled tick. The parallel product is built so 
that tick transitions have priority over other observable 
transitions. Thus, if s is a state of the product and s — 
then s has no other outgoing transition. The definition of passes for digital-clock 
tests is similar to the one for analog-clock tests, with A/||T being replaced by 
A/ 1 1 Tick 1 1 D. The definitions of soundness, completeness and strictness also carry 
over in the natural way. 
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5 Test Generation 



We adapt the untimed test generation algorithm 
of [29] . Roughly speaking, the algorithm builds a 
test in the form of a tree. A node in the tree is a 
set of states S of the specification and represents 
the “knowledge” of the tester at the current test 
state. The algorithm extends the test by adding 
successors to a leaf node, as illustrated in the 
figure to the left. For all illegal outputs (out- 
puts which cannot occur from any state in S) 
the test leads to fail. For each legal output 6,;, the test proceeds to node Si, 
which is the set of states the specification can be in after emitting bi (and pos- 
sibly performing unobservable actions). If there exists an input c which can be 
accepted by the specification at some state in S , then the test may decide to 
emit this input (dashed arrow from S to S'). At any node, the algorithm may 
decide to stop the test and label this node as pass. 

Two features of the above algorithm are worth noting. First, the algorithm is 
only partially specified. Indeed, a number of decisions need to be made at each 




Generic test-generation scheme. 
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node: (1) whether to stop the test or continue, (2) whether to wait or emit an 
input if possible, (3) which input, in case there are many possible inputs. Some 
of these choices can be made according to user-defined parameters, such as the 
desired depth of the test. They can also be made randomly or systematically 
using some book-keeping, in order to generate a test suite, rather than a single 
test. We discuss this option in more detail in Section 5.3. 

The second feature of the algorithm is that it implicitly determinizes the 
specification automaton. Indeed, building Si, Sj and so on corresponds to a clas- 
sical subset construction. The latter can be performed either off-line, that is, 
before the test generation, or on-line, that is, during the test generation or even 
during the test execution. Test generation during test execution has been termed 
on-the-fly and is supported by the tool Torx [3]. 



5.1 Generating Analog-Clock Tests 

Analog-clock tests cannot be represented as a finite tree, because there is an a- 
priori infinite set of possible observable delays at a given node. To remedy this, 
we use the idea of [31]. We represent an analog-clock test as an algorithm. The 
latter essentially performs subset construction on the specification automaton, 
during the execution of the test. Thus, our analog-clock testing method can be 
classified as on-tlre-fly. 

More precisely, the test will maintain a set of states S of the specification 
TAIO, Ag. S will be updated every time an action is observed or some time 
delay elapses. Since the time delay is not known a-priori, it must be an input to 
the update function. We define the following operators: 

dsucc(S', a) = {s' | 3s G S . s A s'} (11) 

tsucc(S', f) = {s' | 3s € S . 3p € RT({r}) . time(p) = iAs4s'} (12) 

where a £ Act and t £ R. dsucc(S', a) contains all states which can be reached 
by some state in S performing action a. tsucc(S l , t) contains all states which can 
be reached by some state in S via a sequence p which contains no observable 
actions and takes exactly t time units. The two operators can be implemented 
using standard data structures for symbolic representation of the state space 
and simple modifications of reachability algorithms for timed automata [31]. 

The test operates as follows. It starts at state So = tsucc({sQ s }, 0). Given 
current state S , if output a is received t time units after entering S , then S is 
updated to dsucc(tsucc(S', t), a). If no event is received until, say, 10 time units 
later, then the test can update its state to tsucc(5', 10). If ever the set S becomes 
empty, the test announces fail. At any point, for an input 6, if dsucc(S', b) ^ 0, 
the test may decide to emit b and update its state accordingly. At any point, the 
test may decide to stop, announcing pass. 

It can be shown that the test defined above is both sound and strict. 
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5.2 Generating Digital-Clock (Periodic-Sampling) Tests 

Since its set of observable events is finite (ActU{tick}), a digital-clock test can be 
represented as a finite tree. In this case, we can decide whether to generate tests 
on-the-fly or off-line. This is a matter of a space/time trade-off. The on-the-fly 
method does not require space to store the generated tests. On the other hand, 
a test computed on-the-fly has a longer reaction time than a test which has been 
computed off-line. 

Independently of which option we choose, we proceed as follows. We first 
form the product A' s = A,s||Tick. We then define the following operator on A' s : 

usucc (S) = (s' | 3s G S . 3p £ RT({r}) . s 4 s'}. (13) 

usucc(S') contains all states which can be reached by some state in S via a 
sequence p which contains no observable actions. Notice that, by construction of 
A' s , the duration of p is bounded: since tick is observable and has to occur after 
at most 1 time unit, time(p) < 1. 
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Fig. 4. A digital-clock test (top) and two alternative representations (bottom). 



Finally, we apply the generic test-generation scheme presented above. The 
root of the test tree is defined to be So = {sq s }. Successors of a node S are 
computed as follows. For each a £ Act out U {tick}, there is an edge S 4 S' with 
S' = dsucc(usucc(S), a), provided S' ^ 0, otherwise there is an edge S 4 fail. 
If there exists b £ Act| n such that S" = dsucc(tsucc(S, 0), b) ^ 0, then the test 

generation algorithm may decide to emit b at S, adding an edge S 4 S". Notice 
the asymmetry in the ways S' and S" are computed. The reason is that the 
tester is assumed to emit an input b immediately upon entering S. Thus, S" 
should only contain the immediate successors of S by b. 

The tests generated in this way are guaranteed to be sound. However, they 
are not strict in general. This is expected, since the tester cannot distinguish 
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between outputs being produced exactly at time 1 or, say, at time 1.5. A sound 
(but not strict) digital-clock test for Spe^ of Figure 1 is shown in the top of 
Figure 4. 

Reducing the size of digital-clock tests: Digital-clock tests can sometimes grow 
large because they contain a number of “chains” of ticks. On the other hand, 
standard test description languages such as TTCN [21] permit the use of vari- 
ables and richer data structures. We would like to use such features to make 
the representation of digital-clock tests more “compact”. For example, the test 
shown in the top of Figure 4 can be equivalently represented as the automaton 
with counter i, shown in the bottom-left of the figure. 

Reducing the size of test representations is a non-trivial problem in general, 
related to compression and algorithmic complexity theory. In our context, we 
only use a heuristic which attempts to eliminate tick chains as much as possible. 
To this purpose, we generalize the labels of the digital-clock test to labels of 
the form /c tick, where k is a positive integer constant. A transition labeled with 
A; tick is taken when the fc-th tick is received, counting from the time the source 
node is entered. Naturally, tick is equivalent to 1 tick. Now, consider two nodes 
S and S' such that: (1) S S', (2) for all a € Act, the successors of S and S' 
are identical, (3) S' S". In this case, we remove node S' (and corresponding 

edges) and add the edge S (fe+ l| tlck yy e re peat the process until no more 
nodes can be removed. The result of applying this heuristic to the test in the 
top of Figure 4 is shown in the bottom-right of the figure. 



5.3 Coverage 

It is generally impossible to generate a finite test suite which is complete, in 
particular when the specification has loops, which define an infinite set of possible 
behaviors. This is because implementations can have an arbitrary number of 
states, while a finite test suite can only explore a bounded number of states. But 
an implementation could be conforming up to a certain point and not conforming 
afterwards. 

To remedy this fact, test generation methods usually make a compromise: 
instead of generating a complete test suite, generate a test suite which covers the 
specification. 4 Different coverage criteria have been proposed for untimed sys- 
tems, such as state coverage (every state of the specification must be “explored” 
by at least one test), transition coverage (every transition must be explored), and 
so on. A survey of coverage criteria and their relationships, in the context of soft- 
ware testing, can be found in [33] . In the case of timed automata the state space 
is infinite, thus, existing methods attempt to cover: either finite abstractions of 
the state space, e.g., the region graph in [28,16], a time-abstracting partition 
graph in [25]; or the structural elements of the specification, e.g., [19] propose 

4 Some methods [28,12] generate a suite which is complete w.r.t. a given upper bound 
on the number of states of the implementation. 
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techniques for edge, location, or definition- use pair coverage and [8] consider 
various criteria in the context of timed Petri nets. 

In the spirit of [19], we propose a heuristic for generating a digital-clock 
test suite covering the edges of the specification automaton. Notice that we 
cannot use the technique of [19], which is based on formulating coverage as 
a reachability problem. Indeed, this technique relies on the assumption that 
outputs in the specification are urgent and isolated, which results in tests being 
sequences , rather than trees. 

Our method aims at covering edges labeled with an input action. Then, edges 
labeled with outputs will also be covered, since a test must be able to accept 
any output at any state. Let T be a test suite and S A S' be an edge in some 
test of T, with a £ Actj n . If e is an edge of As labeled with a and enabled at 
some state in S, then we say that e is covered by T. We say that T covers A$ if 
all input edges of Ag are covered by T. Then, the test generation algorithm can 
stop once it has generated a test suite covering Ag. 

6 Tool and Case Study 

We have built a prototype test-generation tool, called TTG, on top of the IF 
environment [7]. The IF modeling language allows to specify systems consisting 
of many processes communicating through message passing or shared variables 
and includes features such as hierarchy, priorities, dynamic creation and com- 
plex data types. The IF tool-suite includes a simulator, a model checker and a 
connection to the untimed test generator TGV [17]. TTG is implemented inde- 
pendently from TGV. It is written in C++ and uses the basic libraries of IF for 
parsing and symbolic reachability of timed automata with deadlines. 

TTG takes as main input the specification automaton, written in IF language, 
and can generate two types of tests: (1) analog-clock tests under the assumption 
that the implementation is discrete-time and has a time step of 1; (2) digital-clock 
tests with respect to a given Tick automaton. By modifying the Tick automaton, 
the user can implement different sampling rates, model jitter in the sampling 
period, and so on. TTG can be executed in an interactive mode, where the user 
guides the test generation by resolving decision points. TTG can also be asked to 
generate a single test randomly or the exhaustive test suite, up to a user-defined 
depth. The depth of a test is the longest path from the initial state to a pass or 
fail state. The tests are output in IF language. 

We have applied TTG to a small case study, which is a modification of the 
light switch example presented in [19]. The (modified) specification is shown in 
Figure 5. It models a lighting device, consisting of two modules: the “Button” 
module which handles the user interface through a touch-sensitive pad and the 
“Lamp” module which lights the lamp to intensity levels “dim” or “bright”, or 
turns the light off. The user interface logic is as follows: a “single” touch means 
“one level higher”, whereas a “double” touch (two quick consecutive touches) 
means “one level lower”. It is assumed that higher and lower is modulo three, 
thus, a single touch while the light is bright turns it off. 
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Fig. 5. A lighting device. 



The device communicates with the external world through input touch and 
outputs off, dim, bright. Events single and double are used for internal commu- 
nication between the two modules through synchronous rendez-vous and are 
non-observable to the external user. The Button module uses the timing param- 
eter D which specifies the maximum delay between two consecutive touches if 
they are to be considered as a double touch. The Lamp module uses the timing 
parameters m and M which specify the minimum and maximum delay for the 
lamp to change intensity (e.g., to warm-up a halogen bulb). In order not to over- 
load the figure, we omit most guards, resets and deadlines in the Lamp module. 
They are placed similarly to the ones shown in the figure (i.e., resets in inputs, 
guards and deadlines in outputs). 

We have used TTG to generate the exhaustive digital-clock test suite for the 
above specification, with parameter set D = l,m = l,M=2, for various depth 
levels. We have obtained 68, 180, 591 and 2243 tests, for depth levels 5, 6, 7 and 8, 
respectively. Notice that these are the sets of all possible tests up to the specified 
depth: no test selection is performed. Moreover, the current implementation is 
sub-optimal because it generates tests announcing pass before the maximum 
depth is reached. Implementation of test selection criteria is underway. One of 
the tests generated by TTG is shown in Figure 6. The drawing has been produced 
automatically using the if2eps tool by Marius Bozga. 

7 Summary and Future Work 

We have proposed a testing framework for real-time systems based on non- 
deterministic and partially-observable timed-automata specifications. To our 
knowledge, this is the first framework that can fully handle such specifications. 
We introduced a timed version of the input-output conformance relation of [29] 
and proposed techniques to generate analog-clock and digital-clock tests for this 
relation. We reported on a prototype tool and a simple case-study. 

Regarding future work, our priority is to study test selection methods, in 
order to reduce the number of generated tests. To this aim, we are currently 
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Fig. 6. A test generated automatically by TTG. 



implementing the edge-coverage heuristic discussed in Section 5.3. We are also 
working on reducing the size of generated tests, implementing the reduction 
heuristic for digital-clock tests discussed in Section 5.2. Another direction that 
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we pursue is to identify classes of specifications for which analog-clock tests can 
be represented as timed automata. One such class is deterministic and observable 
specifications. The advantage of a timed automata representation is that it avoids 
on-the-fly reachability computation, thus reducing the reaction time of the test. 
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Abstract. We present a technique and a tool for model-checking opera- 
tional UML models based on a mapping of object oriented UML models 
into a framework of communicating extended timed automata - in the 
IF format - and the use of the existing model-checking and simulation 
tools for this format. 

We take into account most of the structural and behavioral characteris- 
tics of classes and their interplay and tackle issues like the combination of 
operations, state machines, inheritance and polymorphism, with a par- 
ticular semantic profile for communication and concurrency. The UML 
dialect considered here, also includes a set of extensions for expressing 
timing. 

Our approach is implemented by a tool importing UML models via 
an XMI repository, and thus supporting several commercial and non- 
commercial UML editors. For user friendly interactive simulation, an 
interface has been built, presenting feedback to the user in terms of the 
original UML model. Model-checking and model exploration can be done 
by reusing the existing IF state-of-the-art validation environment. 



1 Introduction 

We present in this paper a technique and a tool for validating UML models by 
simulation and property verification. The reason why we focus on UML is that 
we feel some of the techniques which emerged in the field of formal validation are 
both essential to the reliable development of real-time and safety critical systems, 
and sufficiently mature to be integrated in a real-life development process. 

Our past experiences (e.g. with the SDL language [8]) show that this in- 
tegration can only work if validation takes into account widely used modeling 
languages. Currently, UML based model driven development encounters a big 
success with the industrial world, and is supported by several CASE tools fur- 
nishing editing, methodological help, code generation and other functions, but 
very little support for validation. 

* This work is supported by the OMEGA European Project (IST-33522). See also 
http:/ / www-omega.imag.fr 



S. Graf and L. Mounier (Eds.): SPIN 2004, LNCS 2989, pp. 127-145, 2004. 
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This work is part of the OMEGA 1ST project, whose aim is building a basis 
for a UML based development environment for real-time and embedded systems, 
including a set of notations for different aspects with common semantic founda- 
tions, tool supported verification methods for large systems, including real-time 
related aspects [11]. 



1.1 Basic Assumptions 

Before going into more detail, in this work we made the following fundamental 
assumptions : 

UML is broader than what we need or can handle in automatic vali- 
dation. In UML 1.4 [33] there are 9 types of diagrams and about 150 language 
concepts (metaclasses). Some of them are too informal to be useful in valida- 
tion (e.g. use cases) while for others the coherence and relationships with the 
rest of the UML model are not clearly (or uniquely) defined (e.g. collaborations, 
activity diagrams, deployment). 

In consequence, in this work we focused on a subset of UML concepts that 
define an operational view of the modeled system: objects, their structure and 
their behavior. The choices, which are not fully explained in this paper, are not 
made ad-hoc. This work is part of a broader project (IST-OMEGA [1]) which 
aims to define a consistent subset of UML ( kernel language ) to be used in safety 
critical, real-time applications. See also [12,11]. 

UML has neither a standard nor a broadly accepted dynamic seman- 
tics. As a consequence, one facet of the OMEGA project is a quest for a suitable 
semantics for UML to be used in complex, safety critical, real-time, possibly dis- 
tributed applications. Effort is put into: finding the right concepts (e.g. commu- 
nication mechanisms between objects, concurrency model, timing specification 
features, see [12]), defining them formally (a formalization in PVS is available 
[23]) and implementing and testing these concepts in tools. 

In this paper we discuss only the problems of implementing and testing the 
semantics, while the definition and formalization are tackled in [12] and [23]. We 
describe a translation to an automata-based formalism implemented in the IF 
tool [6,9]. This results in a flexible implementation of the semantics, in which we 
can easily test the choices of the OMEGA formal semantics and propose changes. 
To produce powerful tools we have to build upon the existing. This 
motivates our choice to do a translation to the IF language [6,9], for which a rich 
set of tools (for static analysis, model checking with various reduction techniques, 
model construction and manipulation, test generation, etc.) already exist. 

Our claim is that most of this tools work on UML-generated models with 
only minor updates 1 

Moreover, in order to be usable a validation tool has to accept UML models 
edited with widely used CASE tools. Our choice to work on the standard XML 
representation for UML (XMI) is a step into this direction. 



1 At least model checking, model construction and manipulation were already tested. 
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1.2 Our Approach in More Detail 

In terms of language coverage, in our semantics and in our tool we focus 
on the operational part of UML: classes with structural and behavioral features, 
relationships (associations, inheritance), behavior descriptions through state ma- 
chines and actions. The issues we tackle, like the combination of operations and 
state machines, inheritance and polymorphism, run-to- completion and concur- 
rency, go beyond the previous work done in this area (see section 1.3), which 
has mainly focused on verification of statecharts. Our choices are outlined in 
section 2. 

Our implementation of the operational semantics of UML models is based on 
a mapping from UML into an intermediate formal representation IF [5] based on 
communicating extended timed automata (CETA). This choice is motivated by 
the existence a verification toolset based on this semantic model [6,9] which has 
been productively used in a number of research projects and case studies, e.g. 
in [7,17]. The main features of the IF language are presented in section 1.4, and 
in section 3 we discuss a mapping from UML into this model which respects the 
semantics given in [12,23]. 

An important issue in designing real-time systems is the ability to capture 
quantitative timing requirements and assumptions, as well as time dependent 
behavior. We rely on the timing extensions defined in the context of the 
Omega project [18,16]. We summarize these extensions and their mapping into 
IF in section 4. 

Another important issue is the formalism in which properties of models 
are expressed. In section 5 we introduce a simple property description language 
( observer objects) that reuses some concepts from UML (like objects, state ma- 
chines) while remaining sufficiently expressive for a large class of linear proper- 
ties. The use of concepts that are familiar to most UML users has the potential to 
alleviate the cultural shock of introducing formal dynamic verification to UML 
models. 

Finally, section 6 presents the UML validation toolset. By using the IF tools 
as underlying simulation and verification engine, the UML tools presented here 
benefit from a large spectrum of model reduction and analysis techniques already 
implemented therein, such as static analysis and optimizations for state-space 
reduction, partial order reductions, some forms of symbolic exploration, model 
minimization and comparison, etc [6,9]. 

The techniques and the tool presented in this paper are subject to exper- 
imental validation on several larger case studies within the OMEGA project 
[!]• 

1.3 Related Work 

The application of formal analysis techniques (and particularly model checking) 
to UML has been a very active field of study in recent years, as witnessed by the 
number of papers on this subject ([29,30,28,27,26,35,14,15,37,3] are most oftenly 
cited) . 
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Like ourselves, most of these authors base their work on an existing model 
checker (SPIN[22] in the case of [29,30,28,35], COSPAN[21] in the case of [37], 
Kronos[38] for [3] and UP PA AL [25] for [26]), and on the mapping of UML to 
the input language of the respective tool. 

For specifying properties, some authors opt for the property language of the 
model checker itself (e.g. [28,29,30]). Others use UML collaboration diagrams 
(e.g. [26,35]) which are too weak to express all relevant properties. We propose to 
use a variant of UML state machines to express properties in terms of observers. 

Concerning language coverage, all previous approaches are restricted to flat- 
class structures (no inheritance) and to behaviors, specified exclusively by stat- 
echarts. In this respect, many important features which make UML an object- 
oriented formalism (inheritance, polymorphism and dynamic binding of opera- 
tions) are not dealt with. Our approach is, to our knowledge, the first to try to 
fill this gap. 

Our starting point for handling of UML state machines (not described in 
detail in this paper) was the material cited above together with previous work 
on Statecharts ([20,13,31] to mention only a few). In the definition of our concur- 
rency model we have taken inspiration from our previous assessment of the UML 
concurrency model [32], and from other positions on this topic (see for example 
[36]) and we respected the operational semantics defined in the OMEGA project 
[ 12 ]- 

1.4 The Back-End Model and Tools 

The validation approach proposed in this work is based on the formal model 
of communicating extended timed automata and on the IF environment built 
around this model [6,9,10]. We summarize the elements of this model in the 
following. 

Modeling with communicating extended automata. 

IF was developed at VERIMAG in order to provide an instrument for modeling 
and validating distributed systems that can manipulate complex data , may in- 
volve dynamic aspects and real time constraints. Additionally, the model allows 
to describe the semantics of higher level formalisms (e.g. UML or SDL) and has 
been used as a format for inter-connecting validation tools. 

In this model, a system is composed of a set of communicating processes that 
run in parallel (see figure 1). Processes are instances of process types. They have 
their own identity (PID), they may own complex data variables (defined through 
ADA-like data type definitions), and their behavior is defined by a state machine. 
The state machine of a process type may use composite states and the effect of 
transitions is described using common (structured) imperative statements. 

Processes may inter-communicate via asynchronous signals , via shared vari- 
ables or via rendez-vous. Parallel processes are composed asynchronously (i.e. 
by interleaving). The model also allows dynamic creation of processes, which is 
an essential feature for modeling object systems that are by definition dynamic. 
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Fig. 1 . Constituents of a communicating extended automata model in IF. 



The link between system execution and time progress may be described in 
a precise manner, and thus offers support for modeling real time constraints. 
We use the concepts from timed automata with urgency [4]: there are special 
variables called clocks which measure time progress and which can be used in 
transition guards. A special attribute of each transition, called urgency , specifies 
if time may progress when the transition is enabled, and by how much (up to 
infinity or only as long as the time-guard of the transition remains true). 

A framework for modeling priority. 

On top of the above model, we use a framework for specifying dynamic priori- 
ties via partial orders between processes. The framework was formalized in [2]. 
Basically, a system description is associated with a set of priority directives of 
the form: ( state condition ) => p\ -< P 2 - They are interpreted as follows: given a 
system state and a directive, if the condition of the directive holds in that state, 
then process with ID p\ has priority over p 2 for the next move (meaning that if 
Pi has an enabled transition, then p 2 is not allowed to move). 

Property specification with observers. 

Dynamic properties of IF models may be expressed using observer automata. 
These are special processes that may monitor 2 the changes in the state of a 
model (variable values, contents of queues, etc.) and the events occurring in it 
(inputs, outputs, creation and destruction of processes, etc.). 

For expressing properties, the states of an observer may be classified (syntac- 
tically) as ordinary or error. Observers may be used to express safety properties. 
A re-interpretation of success states as accepting states of a Biichi automaton 
could also allow observers to express liveness properties. 

2 The semantics is that observer transitions synchronize with the transitions of the 
system. 
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IF observers are rooted in the observer concept introduced by Jard, Groz and 
Monin in the VEDA tool [24] . This intuitive and powerful property specification 
formalism has been adapted over the past 15 years to other languages (LOTOS, 
SDL) and implemented by industrial case tools like Telelogic’s ObjectGEODE. 

Analysis techniques and the IF-2 toolbox. 

The IF-2 toolbox [6,9] is the validation environment built around the formalism 
presented before. It is composed of three categories of tools: 

1. Behavioral tools for simulation, verification of properties, automatic test 
generation, model manipulation (minimization, comparison). The tools im- 
plement techniques such as partial order reductions and symbolic simulation 
of time, and thus present a good level of scalability. 

2. Static analysis tools which provide source-level optimizations that help 
reducing furthermore the state space of the models, and thus improve the 
chance of obtaining results from the behavioral tools. Among the state of 
the art techniques that are implemented we mention data flow analysis (e.g. 
dead variable reduction), slicing and simple forms of abstraction. 

3. Front-ends and exporting tools which provide source-level coupling to 
higher level languages (UML, SDL) and to other verification tools (Spin, 
Agatha, etc.). 

The toolbox has already been used in a series of industrial-size case studies 
[6,9], 

2 Ingredients of UML Models 

This section outlines the semantic- and design-related choices with respect to the 
UML concepts covered and the computation and to the execution model adopted. 

2.1 UML Concepts Covered 

In this work we consider an operational subset of UML, which includes the fol- 
lowing UML concepts: active and passive classes - with their operations and 
attributes, associations, generalizations - including polymorphism and dynamic 
binding of operations, basic data types, signals, and state machines. State ma- 
chines are not discussed in this paper as they are already tackled in many pre- 
vious works like [29,30,28,27,26,35,15,37,3]. 

Additionally to the elements mentioned above, a number of UML extensions 
for describing timing constraints and assumptions are supported. They were 
introduced in [16,18] and are discussed in section 4. 

2.2 The Execution Model 

We describe in this section some of the semantic choices made with respect to 
the computation and the concurrency model implemented by our method and 
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tools. The purpose is to illustrate some of the particularities of the model and 
not to give a complete/formal semantics for UML, which may be found in [12, 
23]. 

The execution model chosen in OMEGA and presented here is an extension 
of the execution model of the Rhapsody UML tool (see [19] for an overview), 
which is already used in a large number of UML applications. Other execution 
models can be accommodated to our framework by adapting the mapping to IF 
accordingly. 

Activity groups and concurrency. 

There are two kinds of classes: active and passive , both being described by 
attributes, relationships, operations and state machines. 

At execution, each instance of an active class defines a concurrency unit called 
activity group. Each instance of a passive class belongs to exactly one activity 
group. 

Different activity groups execute concurrently, and objects inside the same 
an activity group execute sequentially. Groups are sequential on purpose, in 
order to have some default protection against concurrent access to shared data 
(passive objects) in the group. The consequence is that requests (asynchronous 
signals or operation calls) coming from other groups (or even from the same in 
case asynchronous signals) are placed in a queue belonging to the activity group. 
They are handled one by one when the whole group is stable. 

An activity group is stable when all its objects are stable. An object is stable 
if it has nothing to execute spontaneously and no pending operation call from 
inside its group. Note that an object is not necessarily stable when it reaches a 
stable state in the state machine, as there may be transitions that can be taken 
simply upon satisfaction of a Boolean condition. 

The above notion of stability defines a notion of run-to- completion step for 
activity groups: a step is the sequence of actions executed by the objects of the 
group from the moment an external request is taken from the activity group’s 
queue by one of the objects, and until the whole group becomes stable. During 
a step, other requests coming from outside the activity group are not handled 
and are queued. 

Operations, signals and state machines. 

In the UML model we distinguish syntactically between two kinds of operations: 
triggered operations and primitive operations. Reaction to triggered operation 
calls is described directly in the state machine of a class: the operation call is seen 
as a special kind of transition trigger, besides asynchronous signals. Triggered 
operations differ from asynchronous signals in that they may have a return value. 

Primitive operations have the body described by a method, with an associ- 
ated action. Their handling is more delicate since they are dynamically bound 
like in all object-oriented models. This means that, when such an operation call 
is sent to an object, the most appropriate operation implementation with respect 
to the actual type of the called object and to the inheritance hierarchy has to 
be executed. 
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With respect to call initiation, an object having the control may call a prim- 
itive operation on an object from the same activity group at any time, and the 
call is stacked and handled immediately. However, in case of triggered operation 
calls, the dynamic call graph between objects should be acyclic, since an object 
that has already called a triggered operation is necessarily in an unstable state 
and may not handle any more calls. This type of condition may be verified using 
the IF mapping. 

Signals sent inside an activity group are always put in the group queue for 
handling in a later run-to-completion step. This choice is made so that there is 
no intra-group concurrency created by sending signals. 

We note that the model described here corresponds to that of concurrent, 
internally-sequential components (activity groups), which make visible to the 
outside world only the stable states in-between two run-to-completion steps. Such 
a model has been already successfully used by several synchronous languages. 

3 Mapping UML Models to IF 

In this section we give the main lines of the mapping of a UML model to an IF 
system. The idea is to obtain a system that has the same operational semantics 
as the initial UML model (i.e. the same labeled transition system up to bisimula- 
tion). The intermediate layer of IF helps us tackle with the complexity of UML, 
and provides a semantic basis for re-using our existing model checking tools (see 
section 6). 

The mapping is done in a way that all runtime UML entities (objects, call 
stacks, pending messages, etc.) are identifiable as a part of the IF model’s state. 
In simulation and verification, this allows tracing back to the UML specification. 

3.1 Mapping the Object Domain to IF 

Mapping of attributes and associations. Every class X is mapped to a 
process type Px that will have a local variable of corresponding type for each 
attribute or association of X. As inheritance is flattened, all inherited attributes 
and associations are replicated in the processes corresponding to each heir class. 
Activity group management. Each activity group is managed at runtime by a 
special process of a type called GM. This process sequentializes the calls coming 
from outside the activity group, and helps to ensure the run-to-completion policy. 
In each Px there is a local variable leader, which points to the GM process 
managing its activity group. 

Mapping of operations and call polymorphism. For each operation m{p\ : 
U1P2 : ^2) •■•) hr class X, the following components are defined in IF: 

— a signal callx-.-.m{waiiing : pid, caller : pid,callee : pid,pi : t\,p2 : t2,---) 
used to indicate an operation call. If the call is made in the same activity 
group, waiting indicates the process that waits for the completion of the call 
in order to continue execution, caller designates the process that is waiting 
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for a return value, while callee designates the process corresponding to the 
object receiving the call (a Px instance). 

— a signal return x-.-. m (v 1 : tr \ , r ‘2 : tr 2 ,...) used to indicate the return of an 
operation call (sent to the caller ). Several return values may be sent with it. 

— a signal completex-.-.m() used to indicate completion of computation in the 
operation (may differ from return, as an operation is allowed to return a 
result and continue computation). This signal is sent to the waiting process 
(see callx-.-.m )• 

— if the operation is primitive (see 2.2), a process type 
Px-.:m{waiting : pid, caller : pid, callee : pid, p\ \t\,p 2 : t 2 , •••) 

which will describe the behavior of the operation using an automaton. The 
parameters have the same meaning as in the callx-.-.m signal. The callee PID 
is used to access local attributes of the called object, via the shared variable 
mechanism of IF. 

— if the operation is triggered (see 2.2), its implementation will be modeled 
in the state machine of Px (see the respective section below). Transitions 
triggered by a X :: m call event in the UML state machine will be triggered 
by callx-.-. m in the IF automaton. 

The action of invoking an operation X :: m is mapped to the sending of a 
signal callx-.-.m- The signal is sent either directly to the concerned object (if the 
caller is in the same group) or to the object’s active group manager (if the caller 
is in a different group) . The group manager will queue the call and will forward 
it to the destination when the group becomes stable. 

The handling of incoming calls is simply modeled by transition loops (in 
every state 3 of the process Px) which, upon reception of a callx-.-.m will create 
a new instance of the automaton Px-.-.m and wait for it to finish execution (see 
sequence diagram in figure 2). 

In general, the mapping of primitive operation (activations) into separate 
automata created by the called object has several advantages: 

— it allows for extensions to various types of calls other than the ones cur- 
rently supported in the OMEGA semantics (e.g. non-blocking calls). It also 
preserves modularity and readability of the generated model. 

— it provides a simple solution for handling polymorphic calls in an inheritance 
hierarchy: if A and B are a class and its heir, both implementing the method 
m, then P 4 will respond to callA-.-.m by creating a handler process PA-.-.m-, 
while Pb will respond to both callA-.-.m and calls-.-.m , in each case creating a 
handler process Ps-.-.m (figure 3). 

This solution is similar to the one used in most object oriented programming 
language compilers, where a ’’method lookup table” is used for dynamic 
binding of calls to operations; here, the object’s state machine plays the role 
of the lookup table. 

Mapping of constructors. Constructors (take X :: m in the following) differ 
from primitive operations in one respect: their binding is static. As such, they 

3 This is eased by the fact that IF supports hierarchical automata. 
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Fig. 3. Mapping of primitive operations and inheritance. 



do not need the definition of the callx-.-.m signal and the call (creation) action is 
directly the creation of the handler process Px-.-.m ■ The handler process begins by 
creating a Px object and its strong aggregates, after which it continues execution 
like a normal operation. 

Mapping of state machines. UML state machines are mapped almost syntac- 
tically in IF. Certain transformations, not detailed here, are necessary in order to 
support features that are not directly in IF: entry/exit actions, fork/join nodes, 
history, etc. Several prior research papers tackle the problem of mapping state- 
charts to (hierarchical) automata (e.g. [31]). The method we apply is similar to 
such approaches. 

Actions. The action types supported in the original UML model are assign- 
ments. , signal output, control structure actions, object creation, method call and 
return. Some are directly mapped to their IF counterparts, while the others are 
mapped as mentioned above to special signal emissions (call, return ) or process 
creations. 

3.2 Modeling Run-to-Completion Using Dynamic Priorities 

We discuss here how the concurrency model introduced in section 2.2 is realized 
using the dynamic partial priority order mechanism presented in 1.4. 

As mentioned, the calls or signals coming from outside an activity group are 
placed in the group’s queue and are handled one by one in run-to-completion 
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steps. In IF, the group management objects ( GM ) handle this simple queuing 
and forwarding behavior. 

In order to obtain the desired run-to-completion (RTC), the following priority 
protocol is applied (the rules concern processes representing instances of UML 
classes, and not the processes representing operation handlers, etc.): 

— All objects of a group have higher priorities than their group manager: 

\/x, y. ( x.leader = y) => x -< y 

This enforces the following property: 

As long as an object inside the group may move, the group manager will not 
initiate a new RTC step. 

— Each GM object has an attribute running which points to the presently 
or most recently running object in the group. This attribute behaves like a 
token that is taken or released by the objects having something to execute. 
The priority rule: 

\/x, y. ( x = y. leader. running) A (x ^ y) => x -< y 
ensures that 

as long as an object that is already executing has something more to exe- 
cute ( the continuation of an action, or the initiation of a new spontaneous 
transition) , no other object in the same group may start a transition. 

— Every object x with the behavior described by a statechart in UML will 
execute x .leader .running := x at the beginning of each transition. In regard 
of the previous rule, such a transition is executed only when the previously 
running object of the group has reached a stable state, which means that 
the current object may take the running token safely. 

The non-deterministic choice of the next object to execute in a group (stated 
in the semantics) is ensured by the interleaving semantics of IF. 



4 UML Extensions for Capturing Timing 

In order to build a faithful model of a real-time system in UML, one needs to 
represent two types of timing information: 

Time-triggered behavior ( prescriptive modeling): this corresponds, for exam- 
ple, to the common practice in real-time programming environments to link the 
execution of an action to the expiration of a delay (represented sometimes by a 
timer object). 

Knowledge about the timing of events ( descriptive modeling ): information 
taken as a assumption (hypothesis) under which the system works. Examples 
are the worst case execution times of system actions, scheduler latency, etc. 

In addition to that, a high-level UML model may also contain timing require- 
ments ( assertions ) to be imposed upon the system. 

Different UML tools targeting real-time systems adopt different UML ex- 
tensions for expressing such timing information. A standard UML Real-Time 
Profile, defined by the OMG [34], provides a common set of concepts for model- 
ing timing, but their definition remains mostly syntactic. 
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We base our work on the framework defined in [18] for modeling timed sys- 
tems. The framework reuses some of the concepts of the standard real-time 
profile [34] (e.g. timers, certain data types), and additionally allows expressing 
duration constraints between various events occurring in the system. 



4.1 Validation of Timed Specifications 

In this section we present the main concepts taken from [18], that we use in our 
framework, and we give the principles of their mapping to IF. 

For modeling time-triggered behavior, we are using timer and clock objects 
compatible with those of [34] , which are mapped in a straightforward manner to 
IF. 

The modeling of the descriptive timing information makes intensive use of 
the events occurring in a UML system execution. An event has an occurrence 
time, a type and a set of related information depending on its type. The event 
types that can be identified are listed in section 5.2, as they also constitute an 
essential part of our property specification language (presented in section 5). 
All these UML events have a corresponding event in IF. For example: the UML 
event of invoking an operation X :: m corresponds to the event of sending the 
call x ::m signal, etc. 

If several events of the same type and with the same parameters may occur 
during a run, there are mechanisms for identifying the particular event occur- 
rence that is relevant in a certain context. 

Between the events identified as above, we may define duration con- 
straints. The constraints may be either assumptions (hypotheses to be enforced 
upon the system runs) or assertions (properties to be tested on system runs). 

The class diagram example in figure 4 shows how these events and duration 
constraints may be used in a UML model. This model describes a typical client- 
server architecture in which worker objects on the server are supposed to expire 
after a fixed delay of 10 seconds. A timing assumption attached to the client says 
that: ’’whenever a client connects to the server, it will make a request before its 
worker object expires, that is before 10 seconds”. 

For testing or enforcing a timing constraint from the UML model, we are 
presented with two alternatives: 

— if the constraint is local to an object, i.e. all involved events are directly 
observed by the object, the constraint may be tested or enforced by the IF 
process implementing the object 4 . It will use an additional clock for measur- 
ing the duration concerned by the constraint, and a transition to an error 
state (in case of an assertion) or to an invalid state (in case of an assumption) 
with an appropriate guard on that clock. 

— if the constraint is not local to an object (we call it global ), the constraint 
will be tested or enforced by an observer running in parallel with the system. 

4 This is the case in figure 4. In general, outputs and inputs of a process are directly 
observed by itself, but they are not visible to other processes. 
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«TimedEvent» 

EC 

W : LicClient 



«TimedEvent» 

ER 

W : LicClient 



match receivereturn LicServer::connect(void) by c 



match invoke LicClientWorker::request(void) by c I\ 



Fig. 4. Using events to describe timing constraints. 



The tools will ensure that runs not satisfying a constraint are either ignored 
- if it is an assumption, or diagnosed as error - if it is an assertion. 



5 Dynamic Properties Written as UML Observers 

We discuss in this section a technique for specifying and verifying dynamic prop- 
erties of UML models, that we call UML observers. Similarly to IF observer 
automata (section 1.4), UML observers are special objects which run in parallel 
with a UML system and monitor its state and the events that occur. 

Syntactically, observers are described by special UML classes stereotyped 
with <^observer^>. They may own attributes and methods, and may be created 
dynamically. An important part of the observer is its state machine , which is 
triggered by events occurring in the UML model, as we will see in the following. 
The main issue in defining UML observers is the choice of visible event types 
(which include specific UML event types like operation invocation, etc.). 

For UML users, the advantage of UML observers compared to other prop- 
erty specification languages is that they use concepts that are known to UML 
designers (event driven state machines) while remaining sufficiently formal and 
expressive. 



5.1 An Example of Property 

Let us take a simple example: assume that we have a point-to-point communi- 
cation protocol described in UML. Two interfaces TX and RX encapsulate the 
transmission and reception operations, and, to simplify, at runtime there exists 
exactly one object implementing each interface. The interface TX has one block- 
ing operation put{p : Data) (where Data is the packet type) and the interface 
RX has one blocking operation get() that returns a Data. 









140 



I. Ober, S. Graf, and I. Ober 




Fig. 5. Example of observer for a safety property. 



Assume that we want to express the following reliability property: whenever 
put is called with some Data, within at most 5 time units the same Data is 
received at the other end. This also supposes that the user at the other end has 
called get within this time frame, reception being signified by the return from 
get. This property is specified in the observer in figure 5. 



5.2 Basic Observer Ingredients 

An important ingredient of the observer in figure 5 are the event specifications 
on some transitions. Here, the notion of event and the event types are the ones 
introduced in [18]: 

— Events related to operation calls : invoke, receive (reception of call), ac- 
cept (start of actual processing of call - may be different from receive), 
invokereturn (sending of a return value) , receivereturn (reception of the 
return value), acceptreturn (actual consumption of the return value). 

— Events related to signal exchange: send, receive, consume. 

— Events related to actions or transitions: start, end (of execution). 

— Events related to states: entry, exit. 

— Events related to timers (this notion is specific to the model considered in 
[16,18] and in this work): set, reset, occur, consume. 

The trigger of an observer transition may be a match clause, in which case 
the transition will be triggered by certain types of events occurring in the UML 
model. The clause specifies the type of event (e.g. receive in figure 5), some 
related information (e.g. the operation name TX :: put) and observer variables 
that may receive related information (e.g. m which receives the value of the Data 
parameter of put in the concerned call) . 

Besides events, an observer may access any part of the state of the UML 
model: object attributes and state, signal queues. 

As in IF observers, properties are expressed by classifying observer states 
as error or ordinary. Note that an observer may be used also to formalize a 
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Fig. 6. Architecture of the UML-IF validation toolbox. 



hypothesis on system executions, in which case the observer error states mark the 
system states that should be considered invalid with respect to the assumptions. 

Expressing timing properties. 

Certain timing properties may be expressed directly in a UML model using the 
extensions presented in section 4. However, more complicated properties which 
involve several events and more arbitrary ordering between them may be written 
using observers. In order to express quantitative timing properties, observers may 
use the concepts available in our extension of UML, such as docks. 

6 The Simulation and Verification Toolset 

The principles presented in the previous sections are being implemented in the 
UML-IF validation toolbox 5 , the architecture of which is shown in figure 6. With 
this tool, a designer may simulate and verify UML models and observers devel- 
oped in third-party editors 6 and stored in XMI 7 format. The functionality offered 
by the tool, is that of an advanced debugger (with step-back, scenario generation, 
etc.) doubled by a model checker for properties expressed as observers. 

5 See http://www-verimag.imag.fr/PEOPLE/ober/IFx. 

6 Rational Rose, I-Logix Rhapsody and Argo UML have been tested for the moment. 

7 XMI 1.0 or 1.1 for UML 1.4 
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In a first phase, the tool generates an IF specification and a set of IF observers 
corresponding to the model. In a second phase, it drives the IF simulation and 
verification tools so that the validation results fed back to the user may be 
marshaled back to level of the original model. Ultimately, the IF back-end tools 
will be invisible to the UML designer. 

As mentioned in the introduction, by using the IF tools as underlying engine, 
the UML tools have access to several model reduction and analysis techniques 
already implemented. Such techniques aim at improving the scalability of the 
tools, essential in a UML context. Among them, it is worth mentioning static 
analysis and optimizations for state-space reduction, partial order reductions, 
some forms of symbolic exploration, model minimization and comparison [6,9]. 

A first version of this toolset exists and is currently being used on several 
case studies in the context of the OMEGA project. 

7 Conclusions and Plans for Future Work 

We have presented a method and a tool for validating UML models by simu- 
lation and model checking, based on a mapping to an automat a-based model 
(communicating extended timed automata) . 

Although this problem has been previously studied [14,29,28,27,26,35], our 
approach introduces a new dimension by considering the important object- 
oriented features present in UML: inheritance, polymorphism and dynamic bind- 
ing of operations, and their interplay with stateclrarts. We give a solution for 
modeling these concepts with automata: operations are modeled by dynami- 
cally created automata, and thus call stacks are implicitly represented by chains 
of communicating automata. Dynamic binding is achieved through the use of 
signals for operation invocation. We also give a solution for modeling run-to- 
completion and a chosen concurrency semantics using dynamic priorities. 

Our experiments on small case studies show that the simulation and model 
checking overhead introduced by modeling these object-oriented aspects remains 
low, thus not hampering the scalability of the approach. 

For writing and verifying dynamic properties, we propose a formalism that 
remains within the framework of UML: observer objects. We believe this is an 
important issue for the adoption of formal techniques by the UML community. 
Observers are a natural way of writing a large class of properties (linear prop- 
erties with quantitative time). 

In the future we plan to: 

— Assess the applicability of our technique to larger models. The tool is already 
being applied to a set of case studies provided by industrial partners in the 
OMEGA project. 

— Extend the language scope covered by the tool. We plan to integrate the 
component and architecture specification framework defined in OMEGA. 

— Improve the ergonomics and integration of the toolset (e.g. the presentation 
of validation results in terms of the UML model). 
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— Study the possibility of using the additional structure available in the object- 
oriented UML models for improving verification, static analysis, etc. 

Acknowledgemens. The authors wish to thank Marius Bozga and Yassine 

Lakhnech who contributed with ideas and help throughout this work. 
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Abstract. The Mur^>-based Hopper tool is a general purpose explicit model 
checker. Hopper leverages Murk’s class structure to implement new algorithms. 
Hopper differs from MuriyS in that it includes in its distribution published parallel 
and disk based algorithms, as well as several new algorithms. For example. Hop- 
per includes parallel dynamic partitioning, cooperative parallel search for LTL 
violations and property-based guided search (parallel or sequential). We discuss 
Hopper in general and present a recently implemented randomized guided search 
algorithm. In multiple parallel guided searches, randomization increases the ex- 
pected average time to find an error but decreases the expected minimum time to 
find an error. 



The Hopper 1 tool leverages the Mur<^> architecture to implement parallel, disk-based and 
heuristic model checking algorithms. The common theme in the algorithms implemented 
in Hopper is that they do not use abstraction. Instead, Hopper explores fundamental 
algorithms for reducing time and space capacity limitations in state generation and 
storage. Our intention is that algorithms implemented in Hopper can be combined with 
well-known abstraction techniques. The algorithms studied in Hopper are generic enough 
to be implemented in any state enumeration context-including software model checking. 
The current release of Hopper contains parallel and disk based algorithms published by 
Dill and Stern [1,2] and heuristic search using the heuristic proposed by Edelkamp [3], 
Hopper also includes parallel and guided search algorithms developed by the BYU model 
checking research group [4,5,6]. Hopper is a testbed for ideas that will be incorporated 
in our forthcoming C/C++ model checker built as an extension of the GNU debugger 
(GDB). The Hopper distribution includes a suite of 177 benchmark verification problems 
for Murtp. 

This paper describes the architecture and algorithms implemented in Hopper along 
with a new randomized guided algorithm we have recently implemented. Randomiza- 
tion increases the variance (and the mean) of the search effort required to find a property 
violation. Search effort is measured by the number of transitions taken. In some prob- 
lems, increasing the variance (even at the expense of increasing the mean) decreases 
the expected minimum number of transitions taken in error discovery using in parallel 
searches. 

1 Named after Edward Hopper (1882-1967), an early realist painter. Neither Hopper the artist 
nor Hopper the tool rely on abstraction. 
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1 Hopper 

Hopper is a general purpose explicit state model checker built on the Mur<^ code base. 
Hopper, like Mur^, uses a rule-based input language for model descriptions. Although 
it is not process based, like Promela or CSP, it is sufficient to describe large complex 
transition systems [7], Hopper adds polymorphism to Murk’s code base to implement 
new algorithms. The design philosophy of Hopper is to minimally alter code in the basic 
Mur^s distribution when adding new functionality through polymorphism. The behavior 
of key classes is redefined in a separate code base. This design philosophy treats Muip 
as an application programmer interface (API) to prototype new algorithms for empirical 
analysis. Hopper does not support Mursp symmetry reductions in parallel, randomized 
or disk based algorithms. 

Hopper includes an implementation of the Stern and Dill parallel model checking 
algorithm [ 1 ] and the disk based algorithm from [2], The parallel algorithm in Hopper is 
implemented with MPICH 1.2.5 for the communication layer and a modified Dijkstra’s 
token-based termination detection algorithm. MPICH is a free MPI implementation that is 
portable across several different communication fabrics. The modification to Dijkstra’s 
token termination algorithm is required because communication in the parallel algorithm 
is not limited to a ring topology, as required by Dikjstra’s algorithm. The modification 
adds message count information to the token. Termination is detected when the token 
travels around the logical ring and both retains the correct color and indicates that the 
number of messages sent is equal to the number of messages received. After detection, 
termination is completed by passing a poison pill through the ring. The modified Dijk- 
stra’s token termination algorithm is more reliable than the algorithm based on idle time 
used in [1]. The Hopper implementation has been successfully tested and analyzed on 
two platforms with 256 processors and different communication fabrics [5], 

Hopper also includes a parallel algorithm that uses dynamic partitioning to aggregate 
memory on multiple computation nodes. The Stern and Dill algorithm uses a static hash 
function to distribute known reachable states in the model across computation nodes. 
An imbalanced distribution, however, may not efficiently utilize the aggregated memory 
since it may prematurely drive a node to its maximum capacity before all reachable 
state have been enumerated. The dynamic partition algorithm in Hopper constructs the 
partition function on-the-fly. The Mur<y3 architecture simplifies the use of either the static 
or dynamic partitioned hash table when running any given search algorithm. 

Hopper includes a visualization toolkit for postmortem analysis of parallel model 
checking algorithm behavior through time. This Java based toolkit reads time stamped 
entries from Hopper log files. The time series data is then reconstructed and animated. 
The default configuration shows, for each computation node, the size of its state queue, 
the total number of states in its hash table, the number of states sent, and the number of 
states received as dynamic bar charts. 

Hopper implements a cooperative parallel search algorithm for finding LTL vio- 
lations. The bee-based error exploration (BEE) algorithm is designed to operate in a 
non-dedicated parallel computing environment. It does this by employing a decentral- 
ized forager allocation scheme exhibited as a social behavior by honeybee colonies. 
Forager allocation involves identifying flower patches and allocating foragers to forage 
for resources at the patches. In LTL search, flower patches map to accept states and for- 
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aging maps to finding cycles that contain accept states. The resulting algorithm searches 
for accept states, then allocates workstations to forage for cycles beginning at accept 
states. A complete presentation and analysis of the BEE algorithm is given in [4] 

Hopper implements property-based guided search in either parallel or sequential 
modes. The Hopper distribution uses admissible and inadmissible versions of property- 
based heuristics given by Edelkamp et. al. in [3]. Hopper also implements a Bayes 
heuristic search (BHS) to improve the expected accuracy of estimates treated as ran- 
dom variables (i.e., functions that assign a real valued probability between 0 and 1 to 
each possible outcome of an event). A probability density function characterizes the 
distribution of confidence in the heuristic. If the heuristic is accurate, then most of the 
probability is close to the actual distance to the target. The BHS algorithm minimizes 
mean squared error in heuristic estimates using an empirical Bayes [8] meta-heuristic. 
This is done using sets of sibling states to derive the confidence that should be attributed 
to each individual estimate. The confidence level is then used to proportionally revise the 
original estimate toward the mean of the sibling estimates. The theoretical and empirical 
validation of the approach using a Bayesian model is given in [6]. The analysis shows 
that the resulting improved heuristic values have smaller total expected mean squared 
error. 

A model database for empirical testing is a final piece of Hopper. The primary obstacle 
to designing and comparing state enumeration algorithms is a lack of performance data 
on standardized benchmarks. This lack of data obscures the merits of new approaches to 
state enumeration. Hopper includes a set of 177 benchmark models with an web portal 
to add new models and report new benchmark results. The web portal for the database 
is located at http : //vv . cs . byu . edu. 



2 Randomized Guided Search 

Randomizing the guided search algorithm intends to improve the decentralized parallel 
search for LTL violations. The decentralized parallel searches will cover more of the 
search space if they do not all share the same deterministic behavior. Random walk 
is a trivial, but surprisingly effective, way to distribute the searches. In terms on the 
expected number of states explored before finding an error, randomized guided search 
aims to achieve a variance near that of random walk with a mean near that of guided 
search. 

The guided search is randomized by selecting the next state to expand randomly 
from the first n states in the priority queue. Randomizing next state selection increases 
the variance of the expected number of states expanded before finding an error. Suppose 
X is a random variable that represents the number of states expanded before reaching 
an error in a given model for some amount of randomization n. In our experiments, X 
follows a normal distribution with a mean // and a variance a 2 . Increasing randomization 
increases both cr 2 and /;. Randomization can improve search performance because the 
probability of observing a small value of X increases logarithmically in er 2 -if /i remains 
unchanged. 

Unfortunately, randomization of guided search can increase /;. In other cases, ran- 
domization decreases //. In general, increasing randomization in guided best-first search 
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Fig. 1 . Ratio of minimum value in 100 samples of randomized guided search and best deterministic 
search. A ratio less then one indicates that parallel randomization improved search performance. 



drives p toward the number of states expanded by breadth-first search. If breadth-first 
search expands fewer states than non-randomized guided search, then randomization 
decreases p. Otherwise, randomization increases p. 

Taking multiple samples of A' logarithmically increases the probability that any one 
sample will be less than a given threshold. This produces a logarithmic speedup when 
performing independent randomized guided searches in parallel. This is similar to the 
amplification of stochastic advantage in a one-sided probabilistic algorithm. 

We have implemented the randomized guided search algorithm in Hopper, and we 
have conducted a series of experiments to asses the impact of randomization on guided 
search using the BHS algorithm. The amount of randomization was controlled by varying 
n, the pickset which is the number of states in the prefix of the priority queue from 
which the next state to expand was chosen. Each search result is computed by observing 
the outcomes of 100 trials and taking the outcome with the smallest number of states 
expanded. This is done for each model/pickset combination. 

We chose five models for which guided search had a wide range of effects. The effect 
of heuristic guidance on a search problem can be measured using the guided-speeditp 
(GS) which is the BFS transition count divided by the guided search transition count. 
The models used in the experiments have GS ratios ranging from 100 (meaning guided 
search was 100 times faster) to 0.36 (meaning guided search was almost 3 times slower). 
The GS ratios for each model are included later in the legend for Figure 1 . Each model 
contains at least one violation of its invariant. 

Two sets of experiments were conducted: one to determine the effects of randomiza- 
tion on the adashl212e model (for which guided search was particularly ineffective) and 
one to determine the effects of randomization on the five models. The adashl212e model 
was tested with picksets ranging from 2 to 2000 states. As the pickset size increased, 
the mean number of states expanded decreased. For every pickset size, the minimum 
number of states explored by any one node was less than both the deterministic BFS and 
parallel pure random walks. 
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Each of the 5 models were tested with pickset sizes of 2, 3, 4 and 5. The effect 
of randomization on the minimum sample drawn from 100 experiments is shown in 
Figure 1 . Figure 1 plots the ratio of the number of transitions taken in the minimum of 
the 100 samples and the smaller of either the BFS or deterministic guided search. A ratio 
less than 1 indicates that randomization lead to faster error discovery. 

The series of experiments with adash!212e demonstrate that for models in which 
the heuristic performs extremely poorly increasing randomization results in steadily 
decreasing search times. For all 5 models, choosing randomly from the first 2 to 4 states 
in the priority queue gives almost all of the reduction in transition count while avoiding 
state explosion. These results taken together suggest that choosing from the first four 
states in the priority queue balances randomization and guidance. 



3 Conclusion 

Hopper is general purpose model checker built on top of the Muip code base. It uses 
Mur^> as an API that provides low-level building blocks for new algorithms. Hopper 
includes several published and unpublished algorithms, as well as a visualization tool 
and a model database. Recent algorithms in Hopper are a BHS and a parallel randomized 
guided search. Future work for Hopper includes the implementation of novel disk-based, 
shared memory and multi-agent search algorithms. Hopper is a testbed for algorithms 
for increases capacity that can be incorporated into any state enumeration tool, including 
SPIN. 
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Abstract. We report about recent enhancements of the Cadp verifica- 
tion tool set that allow to check the correctness of event traces obtained 
by simulating or executing complex, industrial-size systems. Correct- 
ness properties are expressed using either regular expressions or modal 
/j-calculus formulas, and verified efficiently on very large traces. 



1 Introduction 

Trace-based verification [3,10,15,16] consists in assessing the correctness of a 
(software, hardware, telecommunication...) system by checking a set of event 
traces, i.e., chronological lists of inputs/outputs events sent/received by this 
system. Although trace-based verification is more limited than general verifica- 
tion on state graphs or Labelled Transition Systems (Ltss), it might be the only 
option for “real” systems that run as “black boxes” , disclose none or little infor- 
mation about their internal state, and provide no means for an external observer 
to control or simply know about their branching structure (i.e., the list of pos- 
sible transitions permitted in a given state). This is particularly true when the 
source code of these systems is not available, or cannot be instrumented easily. 

The importance of trace-based verification is widely recognized in the hard- 
ware community, where traces might be the only information available during 
the execution of a circuit or the simulation of an Hdl model. In particular, there 
are recent efforts to standardize the use of temporal logics (e.g., Sugar [4] aud 
ForSpec [2]) for trace-based verification. 

Trace-based verification can be either on-line (i.e., verification is done at the 
same time the trace is generated) or off-line (i.e., the trace is generated first, 
stored in a file and verified afterwards). On-line verification avoids to store the 
trace in a file, but gets potentially slower if several correctness properties must 
be checked on the same trace, in which case it might be faster to generate the 
trace only once and perform all verifications off-line. 

In this paper, we present a general solution for off-line trace-based verifica- 
tion, which is easily applicable to the traces generated by virtually any system. 

* This research was partially funded by Bull S.A. and the European 1st -2001-32360 
project “Arch Ware”. 
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Traces are encoded on a very simple, line-based format. Correctness properties 
are specified using either regular expressions or /i-calculus formulas, and model 
checked using dedicated tools of the widespread Cadp verification tool set [9]. 
Notice that on-line trace-based verification could also be addressed by the Cadp 
tools supporting on the fly verification, although this is beyond the scope of this 
paper. 



2 Assumptions on Trace Structure and Representation 



Following the “black box” testing paradigm, we assume that the internal state 
of the system is not available for inspection. Thus, a trace is defined as a (de- 
generated) Lts (S, A, T, so) consisting of a set S of states, a set A of actions 
(transition labels corresponding to the input/output communications of the sys- 
tem), a transition relation T C S x Ax S, and an initial state so € S. Should (a 
part of) the internal state be observable, then this information could be encoded 
in the actions without loss of generality [14]. 

We then assume that the length, i.e., the number of states (and transitions) 
in a trace can be large (e.g., several millions), since traces can be produced 
by hour- or day-long simulation/execution of the system. In fact, the number of 
states can be as large as for classical explicit-state verification (with the difference 
that traces are particular Ltss with a tiny breadth and a huge depth). 

We make no special assumption regarding actions. Their contents are unre- 
stricted and may include any sequence of data, including variable-length values 
such as lists, sets, etc. We therefore represent actions as arbitrary-length char- 
acter strings. As a consequence, the set A of all possible actions may be very 
large (or even unbounded) , so that it might be prohibitive (or even infeasible) to 
enumerate its elements. Finally, we make no assumption of regularity or locality 
in the occurrence of actions. In the worst-case, a trace might contain as many 
different actions as it contains transitions. 

In general, traces might be too large to fit into main memory entirely and 
must be stored in computer files instead. The Cadp tool set provides a textual file 
format (the Seq format) for representing traces. So far, this format was mostly 
used to display the counter-examples generated by Cadp model-checkers, but, 
since this format satisfies the above assumptions, we decided to adopt it for 
trace-based verification as well. In practice, it is often convenient to store in the 
same file several traces issued from the same initial state. For this reason, a Seq 
file consists of a set of finite traces, separated by the choice symbol “[]”, which 
indicates the existence of several branches starting at the initial state. Each 
trace consists of a list of character strings (one string per line) enclosed between 
double quotes, each representing one action in the trace. The Seq format also 
admits comments (enclosed between the special characters “\001” and “\002”). 
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3 Principles of the Seq.Open Tool 

All the Cadp tools that operate on the fly (i.e., execution, simulation, test gen- 
eration, and verification tools) rely upon the Open / CjESAR [8] software frame- 
work. Due to the modularity and reusability brought by Open/CjESAR, it was 
not needed to develop yet another model checker dedicated to traces encoded 
in the Seq format. The proper approach was to design a new tool (named 
Seq.Open) that connects the Seq format to Open / Caesar, thus allowing all the 
Open / CjESAR tools (including model checkers) to be applied to traces without 
any modification. 

A central feature of Open/CjESAR is its generic Api ( Application Program- 
ming Interface) providing an abstract representation for on the fly Ltss. This 
Api clearly separates language-dependent aspects (translation of source lan- 
guages into Lts models, which is done by Open / C MS XR- compliant compilers 
implementing the Api) from the language-independent aspects (on the fly Lts 
exploration algorithms built on top of the Api). In a nutshell, the Api consists 
of two types “Lts state” and “Lts label”, equipped with comparison, hash, and 
print functions, and two operations computing the initial state of the Lts and 
the transitions going out of a given state. 

Seq.Open is an OPEN/C^ESAR-compliant compiler that maps a Seq file 
onto the aforementioned Api (see Figure 1). A set of n traces contained in 
a Seq file can be viewed as an Lts with three types of states: deadlock states 
(terminating states, with 0 successors 1 ); normal states (intermediate states, with 
1 successor); and the initial state (common to all traces, with n successors). The 
user of Seq.Open may decide to explore all the n traces, or only the f-th one 
(1 < i < n). 

An Lts label is implemented by Seq.Open as an offset in the Seq file (the 
offset returned by the Posix function ftellO for the double quote opening 
the label character string). This representation is not canonical: The same label 
occurring at different places in the file is represented by different offsets. 

States also are implemented as offsets. Each deadlock state is represented by 
the special offset - 1 . Each normal state s is represented by the offset of the label 
of the transition going out of s. The initial state is represented by the offset of 
the first label of the first trace to be considered. Contrary to labels, the state 
representation is canonical (up to graph isomorphism). 

A transition (si, a, sy) of the Lts is encoded by a couple (oi, 02), where o\ is 
the offset of state si (equal to the offset of label a) and 02 is the offset of state 
S2 - The transition relation is implemented as follows. Deadlock states have no 
successors. For a normal state s with offset Oi, the offset 02 of its successor is 
computed by positioning the file cursor at 0\ using the f seekO function, reading 
the character string of the transition label going out of s (possibly skipping 
comments), then taking for 02 the offset returned by the ftellO function. For 

1 All traces end in a deadlock state. If necessary, a distinction can be made between 
successful and abnormal termination by considering the action of the last transition 
preceding the deadlock state. 
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Fig. 1. The Seq.Open Tool 



the initial state, the successors are computed only once at initialization. This is 
done using a preliminary scan of the Seq file, unless the user wants to consider 
the first trace only (a frequent situation for which no preliminary scan is needed). 

To reduce the time overhead induced by calls to fseekO, i.e. , back and 
forth skips inside the Seq file, we introduced a hash-based cache table similar to 
those used in Bdd implementations. This table has a prime number N of entries, 
which is chosen by the user and remains constant regardless of the number of 
visited states /transitions. The table speeds up both ( 1 ) for a label a known by its 
offset, the computation of the character string of a, and ( 2 ) for a normal state 
si known by its offset, the computation of the outgoing transition (si,a, S2). 
Precisely, for a label (resp. normal state) with offset 01, the table entry of index 
0\mod N may contain the character string of label 01 (resp. the character string 
of the transition label going out of state 01, and the successor state offset o 2 ); 
if this entry is already occupied by another label (resp. state), its contents will 
be erased and replaced with information corresponding to the label (resp. state) 
with offset 01 (this information will be computed using fseekO and ftellO 
as explained above). 

The other operations of the Open/Ctesar Api are performed as follows. 
Comparison of states is simply an equality test of their offsets (since state offsets 
are canonical) . Comparison of labels is done by comparing their character strings 
(since label offsets are not canonical), a comparison that is sped up when labels 
are already present in the cache. 
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4 Verification of Trace Properties 

Cadp provides two different tools for checking general properties over traces on 
the fly 2 . 

Exhibitor allows to check linear-time properties expressed as regular ex- 
pressions over traces. Individual actions in a trace are characterized by boolean 
formulas consisting of action predicates (plain character strings or UNIX-like 
regular expressions matching several character strings) combined using boolean 
connectors such as negation disjunction (“I”), and conjunction (“&”). 

Regular expressions over traces consist of these boolean formulas combined us- 
ing regular operators such as concatenation (newline character “\n”), choice 
(“[]”), and iteration (“*” and “+”). A special “<deadlock>” operator charac- 
terizes deadlock states. Additional operators inspired from linear temporal logic 
are provided as shorthand notations: “<until> P” is equivalent to “(~P)*” and 
“<while> P <until> Q” is equivalent to “(P & ~Q)* \n Q” . 

Evaluator [13] allows to check branching-time properties expressed in 
alternation- free //-calculus [7], a specification formalism for which efficient 
model checking algorithms exist [5] with a linear (time and space) complex- 
ity 0(|^| • (|Sj + |T|)), where \<p\ is the number of operators in the formula tp 
to be checked, and where |S| and |T| are the respective numbers of states and 
transitions in the Lts under verification. It was shown recently that on acyclic 
Ltss (which contain traces as a particular case), the alternation- free //-calculus 
has the same expressive power as the full //-calculus [11]. This result allows, in 
the case of acyclic Ltss, to benefit from the expressiveness of the full //-calculus 
(which subsumes most temporal logics, including Ctl and Pdl [7], as well as 
Ltl and Ctl* [6]) still keeping model checking algorithms with a linear (rather 
than exponential) complexity. Furthermore, space complexity can be reduced 
down to 0(\ip\ ■ 151) (still maintaining a linear complexity in time) by using spe- 
cialized algorithms for checking alternation-free //-calculus formulas on acyclic 
Ltss [11,12], 

Both Exhibitor and Evaluator provide diagnostic generation features, al- 
lowing to exhibit the prefix of the trace illustrating the truth value of a property. 



5 Conclusion 

We presented a working solution for model checking large event traces. Based 
upon the generic Open / CjESAR [8] framework, this solution relies on a new soft- 
ware tool, Seq.Open, which enables fast, cache-based handling of large traces 
stored in computer files. Combined with already existing components of the 
Cadp tool set (such as Exhibitor and Evaluator), Seq.Open allows to ver- 
ify trace properties efficiently. 

2 For very specific, application-dependent properties, additional tools could be devel- 
oped using the Open/Ca;sar environment. 
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Due to the extreme simplicity of Seq. Open’s line-based trace format, our 
solution is not “intrusive” , in the sense that it is easily applicable to most existing 
systems without heavy reengineering. 

In the setting of hardware systems, our solution was chosen by Bull for val- 
idating the traces produced by the Verilog simulation of the cache coherency 
protocol used in Bull’s Novascale multiprocessor servers. This validation task, 
previously done by human reviewers, is now fully automated with good perfor- 
mances (7.4 million model checking jobs in 23 hours using a standard 700 MHz 
Pentium PC). 

In the setting of software systems, our solution is used to analyze the traces 
produced by a multi-threaded virtual machine, which provides the runtime envi- 
ronment for executing the ArchWare description language for mobile software 
architectures. 

As for future work, we plan to study trace-based verification algorithms that 
improve “locality”, i.e., produce less faults in the Seq. Open cache table. 

Acknowledgements. The authors are grateful to Bruno Ondet (Inria/Vasy) 
for his contribution to the implementation of Seq. Open, and to Nicolas Zuanon 
and Solofo Ramangalahy (Bull) for their industrial feedback. 
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1 Introduction 

The study of genetic regulatory networks, which underlie the functioning of liv- 
ing organisms, has received a major impetus from the recent development of 
high-throughput genomic techniques. This experimental progress calls for the 
development of appropriate computer tools supporting the analysis of genetic 
regulatory processes. We have developed a modeling and simulation method [5, 
7], based on piecewise-linear differential equations, that is well-adapted to the 
qualitative nature of most available biological data. The method has been im- 
plemented in the tool Genetic Network Analyzer (Gna) [6], which produces a 
graph of qualitative states and transitions between qualitative states. The graph 
provides a discrete abstraction of the dynamics of the system. 

A bottleneck in the application of the qualitative simulation method is the 
analysis of the state transition graph, which is usually too large for visual in- 
spection. In this paper, we propose a model-checking approach to perform this 
task in a systematic and efficient way. Given that certain properties of biological 
interest are of a branching nature (see, e.g., the bistability property in Section 3), 
a branching-time temporal logic is necessary. Also, abstractions of state transi- 
tion graphs can be performed more conveniently by using standard equivalence 
relations defined on Labeled Transition Systems (Ltss) rather than by imple- 
menting ad hoc reductions. Therefore, we developed a connection between the 
qualitative simulator Gna and the widely-used Cadp verification toolbox [8], 
which provides the required analysis functionalities on Ltss. 

The connection is established as follows. Firstly, a dedicated translator con- 
verts the state transition graph resulting from qualitative simulation into an Lts 
suitable for automated verification. Then, after instantaneous states have been 
abstracted away by means of branching bisimulation, various properties charac- 
terizing the evolution of protein concentrations are checked by encoding them 
in regular alternation-free /i-calculus. The diagnostics produced by the Cadp 
model checker make it possible to establish a correspondence between verifica- 
tion results and biological data, for instance by characterizing evolutions leading 
to equilibrium states. We illustrate the combined use of qualitative simulation 
and model checking by means of a simple, biologically-inspired example. 



S. Graf and L. Mounier (Eds.): SPIN 2004, LNCS 2989, pp. 158—163, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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2 Qualitative Simulation of Genetic Regulatory Networks 

We consider qualitative models of genetic regulatory networks, based on a class 
of piecewise-linear differential equations originally proposed in mathemetical bi- 
ology [10]. Given a qualitative model of a genetic regulatory network, the quali- 
tative simulation method produces a graph of qualitative states and transitions 
between qualitative states, qualitatively summarizing the dynamics of the sys- 
tem [5,7]. In the sequel, we present the method by means of an example. 

Figure 1(a) represents a simple genetic regulatory network consisting of two 
genes, a and b, and two proteins, A and B. When a gene is expressed, the cor- 
responding protein is synthesized, which, in turn, can regulate the expression of 
its own and the other gene. For example, when gene a is expressed, protein A 
is synthesized and, depending on whether its concentration is above or below a 
threshold, it may inhibit the expression of gene a and/or b. This network can be 
described by means of the differential equations (l)-( 2 ), where x a and Xf, denote 
the concentration of proteins A and B, 9\, 0 0/, and 6%, threshold concen- 
trations and s~ , the decreasing step function. For example, equation (1) states 
that protein A is produced (at a rate K a ), if and only if s~(x a , 9%) s~(xb, 9%) = 1, 
that is, if and only if x a and Xb are below thresholds 9 \ and 9% respectively. In 
addition, protein A is degraded at a rate proportional to its own concentration 
(■y a > 0). The parameter inequalities (3)-(4) constrain the parameter values. 



x a = K a s~(x a ,dl)s~(x b ,9l) ~-y a x a (1) 
Xb = Kb S~(x a , oi) S~(Xb,db) - JbXb (2) 
0 < 0i < dl < Ka/'Ya < max a (3) 

0 < 9l < 0l < Kb/jb < maxb (4) 

(b) 

Fig. 1. (a) Example of a genetic regulatory network of two genes, a and b. The no- 
tation follows, in a somewhat simplified form, the graphical conventions proposed by 
Kohn [11]. (b) Qualitative model, corresponding to the two-gene example, composed 
of piecewise-linear differential equations (l)-(2) and parameter inequalities (3)-(4). 

The phase space can be partitioned into (lryper)rectangular regions, called 
flow domains, where the flow is qualitatively identical, that is, where the sign 
of the derivatives is identical for all solutions (see Figure 2(a)). For example, in 
flow domain D 1 = [0, 0*[x [0, d\ [, the expression s~(x a , 9%) s~(xb, 9\) evaluates 
to 1 and equation (1) becomes x a = n a — 7 a x a - From the inequalities (3), it 
follows that x a < 9^ < n a /^j a , so x a > 0. To each flow domain D corresponds a 
qualitative state QS defined as the tuple (D, S), where the vector S represents 
the derivative sign of solutions in D. A qualitative state QS = ( D , S) is called 
instantaneous , if all solutions traverse D instantaneously and persistent other- 
wise. There is a transition from QS 1 = (I? 1 , S' 1 ) to QS 2 = ( D 2 ,S 2 ), if there 
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exists a solution reaching D 2 from D 1 . The set of qualitative states and tran- 
sitions between qualitative states together form the state transition graph. The 
state transition graph corresponding to our two-gene example, represented in 
figure 2(b), contains 18 persistent qualitative states, including two stable ( QS 6 , 
QS 22 ) and one unstable (QS 12 ) qualitative equilibrium states. 

So, using the simulation method sketched above, the qualitative behavior 
emerging from genetic regulatory interactions can be predicted. These results 
are obtained using a version of the Gna tool still under development [6]. The 
publicly-available version of Gna (Gna 5.0) gives similar results, but uses a 
slightly coarser partition of the phase space into domains, which makes the 
interpretation of the properties associated to qualitative states less straightfor- 
ward. 



D s D 2 D k D 21 D 27 D 33 
0 2 Z >6 D 7 D 14 D 20 D 26 D 32 
D 4 D 3 D 13 D 19 D 25 D 31 
el y D 3 D 12 D 1S D 24 D 30 
D 1 D n D 17 D 23 D 22 



Z)10 Jl6 

0 e\ 9 2 max a 

(a) 




(b) 



Fig. 2. (a) Partition of the phase space into 33 flow domains. Flow domains may be 
of dimension 2 (e.g., D 1 ), 1 ( e.g ., D 11 ) or 0 ( e.g ., D 12 ). (b) State transition graph, cor- 
responding to the two-gene example in figure 1(a). Filled and unfilled dots correspond 
to instantaneous and persistent qualitative states, respectively. Qualitative equilibrium 
states are circled in addition. 



3 Transformation of Simulation Results into LTSs 

First of all, it is necessary to translate the simulation result into a format suit- 
able for verification. For this purpose, we developed a translator which takes as 
input a state transition graph produced by qualitative simulation and produces 
a corresponding Lts encoded in the Bcg ( Binary Coded Graph) format used by 
Cadp [8]. 

To each qualitative state corresponds a state with a self-transition (loop) in 
the Lts. The label of this loop encodes all the properties of the corresponding 
qualitative state: its name, the range and derivative sign of protein concentra- 
tions, and additional properties specifying whether the state is instantaneous, 
persistent, or a stable or unstable equilibrium. Each transition between qualita- 
tive states is encoded in the Lts by an invisible transition (labeled by the action 
“i”, noted r in Ccs). Since state transition graphs produced by qualitative sim- 
ulation of genetic regulatory networks may be disconnected, we create an initial 
state which is linked to all other states in the Lts via special transitions. 
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Figure 3(a) shows the translation of the persistent qualitative state QS 1 
associated to the flow domain D 1 . This qualitative state is encoded in the Lts by 
a state with a loop labeled "PERS < [0,tal [x[0,tbl [> A+ B+”. Three invisible 
transitions originate from this state, linking it to the states corresponding to 
QS 3 , QS 11 , and QS 12 . 




(b) (c) 



Fig. 3. (a) Fragment of the Lts corresponding to the qualitative state QS 1 and its 
successors. “INST”, “PERS”, “PSEQ” and “PUEQ” denote qualitative states that are 
instantaneous, persistent non equilibrium, persistent stable equilibrium, and persis- 
tent unstable equilibrium, respectively. The derivative sign of each protein concentra- 
tion is represented by and ’+’. (b) Bistability property formulated in regular 

alternation-free /x-calculus. (c) Diagnostic produced by model-checking. 



A simplified Lts can be obtained from the original Lts by abstracting away 
the states corresponding to instantaneous qualitative states. By exploiting the 
fact that in the qualitative model there are never two successive instantaneous 
states, we can perform this simplification by hiding all labels corresponding to 
instantaneous qualitative states in the original Lts and minimizing it modulo 
branching bisimulation using the Bcg_Min tool of Cadp. The dynamical prop- 
erties of interest are preserved in the simplified Lts. 

4 Verification Using Temporal Logic 

Once the Lts corresponding to the genetic regulatory network has been gener- 
ated by using Gna together with the translator described in Section 3, we can 
use the model checking technologies available in Cadp to analyze the behaviour 
of the biological system. The methodology adopted consists of two steps: 
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— First, each desired property is expressed as a formula in regular alternation- 
free /i-calculus [12], which is the input language of the Evaluator 3.0 model 
checker of Cadp. This temporal logic is a good compromise between expres- 
sive power (it subsumes Ctl and Pdl), user-friendliness (concise formula 
descriptions due to regular expressions), and model-checking efficiency (al- 
gorithms linear w.r.t. formula size and Lts size). Also, generic properties 
can be encoded as macro definitions and grouped into reusable libraries. 

— Second, each property is verified on the Lts using Evaluator 3.0, which 
produces diagnostics (counterexamples and witnesses) illustrating its truth 
value. The diagnostics obtained, represented as Ltss, can then be inspected 
visually using the Bcg_Edit graphical Lts editor of Cadp. Diagnostics can 
also be replayed interactively in the Lts by means of the graphical simulator 
Ocis of Cadp. 

Figure 3 illustrates the verification of the bistability property on the Lts con- 
structed from the genetic regulatory network given in Figure 1. This property 
states that from an initial state QS 1 in which both proteins A and B have low 
concentrations (below 9\ and 9\), it is possible to reach two different stable 
equilibrium states ( QS 22 and QS 6 ) in which only one protein is present and at 
a high concentration (at 9 \ and at 9%, respectively). The diagnostic (witness) 
exhibited by Evaluator 3.0 is a Lts subgraph containing the paths going from 
the initial state to the two stable equilibrium states. 

Other biologically interesting properties ( e.g ., reachability of certain equi- 
librium states, existence of behaviours satisfying certain constraints on protein 
concentrations) can be verified in a similar way. 

5 Conclusion 

Our approach for analyzing biological systems consists in connecting Gna, a 
qualitative simulation tool well-adapted to the available information on genetic 
regulatory networks, to the widely-used Cadp verification toolbox. By translat- 
ing the state transition graph produced by Gna into a Lts, standard verification 
technologies become available for analyzing the dynamics of the underlying ge- 
netic regulatory network. 

Checking properties of qualitative simulation results using temporal logic was 
originally proposed by Shults and Kuipers [14] . Clrabrier and Fages [4] and Peres 
and Comet [13] have also addressed the formal analysis of genetic regulatory net- 
works using model-checking approaches, but they use rather simple rule-based 
and Boolean models, respectively. Like us, Alur et al. [1], Antoniotti et al. [2] 
and Ghosh et al. [9] use hybrid models to analyze biological networks. However, 
Alur et al. and Antoniotti et al. use numerical instead of qualitative models. 
The most closely-related work is the symbolic reachability analysis of Ghosh et 
al ., but the authors ignore the problems related to discontinuities in the right- 
hand side of the differential equations. Additionally, Gna has been tailored so as 
to exploit the favorable mathematical properties of the piecewise-linear models, 
which may make it better capable of analyzing large and complex genetic regu- 
latory networks. In previous work, we used Ctl [3], as in [4] and [13], but the 
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use of the more expressive and convenient regular alternation-free /t-calculus, to- 
gether with the diagnostic generation and interactive simulation facilities offered 
by Cadp, makes it possible to easily express properties, interpret the results of 
the analysis, and relate them to biological reality. 
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Abstract. Software verification is recognized as an important and dif- 
ficult problem. We present a novel framework, based on symbolic exe- 
cution, for the automated verification of software. The framework uses 
annotations in the form of method specifications and loop invariants. 

We present a novel iterative technique that uses invariant strengthening 
and approximation for discovering these loop invariants automatically. 

The technique handles different types of data (e.g. boolean and numeric 
constraints, dynamically allocated structures and arrays) and it allows 
for checking universally quantified formulas. Our framework is built on 
top of the Java PathFinder model checking toolset and it was used for 
the verification of several non-trivial Java programs. 

1 Introduction 

Model checking is becoming a popular technique for the verification of soft- 
ware [1,6,30,21], but it typically can only deal with closed systems and it suffers 
from the state-explosion problem. In previous work [23] we have developed a ver- 
ification framework based on symbolic execution [24] and model checking that 
allows the analysis of complex software that take inputs from unbounded do- 
mains with complex structure, and helps combat state-space explosion. In that 
framework, a program is instrumented to add support for manipulating formu- 
las and for systematic treatment of aliasing, so that to enable a standard model 
checker to perform symbolic execution of the program. The framework is built 
on top of the Java PathFinder model checker and it was used for test input 
generation and for error detection in complex Java programs, but it could not 
be used for proving properties of programs containing loops. 

We present here a method that uses the symbolic execution framework pre- 
sented in [23] for proving (light-weight) specifications of Java programs that con- 
tain loops. The method requires annotations in the form of method specifications 
and loop invariants. We present a novel iterative technique that uses invariant 
strengthening and approximation for discovering these loop invariants automati- 
cally. Our technique uniformly handles different types of constraints (e.g. boolean 
and numeric constraints, constraints on dynamically allocated structures and ar- 
rays) and it allows for checking universally quantified formulas. These formulas 
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are necessary for expressing properties of programs that manipulate unbounded 
data, such as arrays. 

Our technique for loop invariant generation works backward from the prop- 
erty to be checked and has three basic ingredients: iterative invariant strength- 
ening, iterative approximation and refinement. Symbolic execution is used to 
check that the current invariant is inductive: the base case checks that the cur- 
rent candidate invariant is true when entering the loop and the induction step 
checks that the current invariant is maintained by the execution of the loop body. 
Failed proofs of the induction step are used for iterative invariant strengthening, 
a process that may result in a (possibly infinite) sequence of candidate invari- 
ants. At each strengthening step, we further use a novel iterative approximation 
technique to achieve termination. 

For strengthening step k, we use a (finite) set of relevant constraints called the 
universe of constraints Uk ■ The iterative approximation consists of a sequence 
of strengthening in which we drop all the constraints that are newly generated 
(and are not present in Uk)- Since Uk is finite, this process is guaranteed to 
converge to an inductive approximate invariant that is a boolean combination of 
the constraints in Uk- The intuition here has similarities to predicate abstraction 
techniques [17], that perform iterative computations over a finite set of predicates 
(i.e. constraints). A failed base case proof can either indicate that there is an 
error in the program or that the approximation that we use at the current step 
is too strong, in which case we use refinement, that consists of enlarging the 
universe of constraints with new constraints that come from the next candidate 
invariant (computed at step k- 1-1). 

Loop invariant generation has received much attention in the literature, see 
e.g. [8,5,25,31,29]. Most of the methods presented in these papers were con- 
cerned with the generation of numerical invariants. A recent paper [13] describes 
a loop invariant generation method for Java programs that uses predicate ab- 
straction. The method handles universally quantified specifications but it relies 
on user supplied input predicates. We show (in Section 5) how our iterative tech- 
nique discovers invariants for (some of) the examples from [13] without any user 
supplied predicates. 

The main contributions of our work are: 

— A verification framework that combines symbolic execution and model check- 
ing in a novel way; we extend the basic framework presented in [23] with the 
ability to handle arrays symbolically and to prove partial-correctness spec- 
ifications, that may be universally quantified. This results in a flexible and 
powerful tool that can be used for proving program correctness, in addition 
to test input generation and model checking. 

— A new method for iterative invariant generation. The method handles uni- 
formly different types of constraints (e.g. boolean and numeric constraints, 
arrays and objects) and it can be used in conjunction with more powerful 
approximation methods (e.g. widening [9,7]). 

— A series of (small) non-trivial Java examples showing the merits of our 
method; our method extends to other languages and model checkers. 
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// 0 precondition: a != null; 
void example (int [] a) { 

1 : int i = 0 ; 

2: while (i < a. length) { 

3: a[i] = 0; 

4: i++; 

> 

5: assert a[0] == 0; 

> 



Fig. 1 . Motivating example 



Section 2 shows an example analysis in our framework. Section 3 gives back- 
ground on symbolic execution and it describes our symbolic execution frame- 
work for Java programs. Section 4 gives our method for proving properties of 
Java programs using symbolic execution and invariant generation and Section 5 
illustrates its application to the verification of several non-trivial Java programs. 
We give related work in Section 6 and conclude in Section 7. 



2 Example 

We illustrate our verification framework using the code shown in Figure 1. This 
method takes as a parameter an array of integers a and it sets all the elements 
of a to zero. This method has a precondition that its input is not null. The 
assert clause declares a partial correctness property that states that after the 
execution of the loop, the value of the first element in a is zero. 

Using the loop invariant i > 0, our framework can be used to automatically 
check that there are no array bounds violations. This is a simple invariant that 
can be stated without much effort. In order to prove that there are no assertion 
violations, a more complex loop invariant is needed: -i(a[0] / 0 Ai > 0). 

Constructing this loop invariant requires ingenuity. Our framework discovers 
this invariant by iterative approximation. It starts with I 0 = -i(a[0] / 0 Ai > 
a. length) which is the weakest possible invariant that is necessary to prove that 
the assertion is not violated. When checking this invariant to see if it is inductive 
we find a violation: if the formula (i + 1) > a.length A a[0] y^ 0 A 0 < i < a. length 
holds at the beginning of the loop, then I 0 does not hold at the end of the loop. At 
the next iteration, we strengthen I 0 using o[0] y^ 0A0 < i < a.length (i.e. we drop 
the new constraint (i + 1) > a.length that is due to the iterative computation 
in the loop body). This yields the formula: ->(a[0] yf 0 A i > a.length) A ->(a[0] y^ 
0 A 0 < i < a.length ), which simplifies to the desired invariant. 

Now suppose we want to verify an additional assertion, which states that, 
after the execution of the loop, every element in the array a is set to zero: 
V int j : a[j] = 0. This assertion is universally quantified; it refers to the quan- 
tified variable j as well to the program variables. We model it by introducing 
a symbolic constant j, which is a new variable that is not mentioned elsewhere 
in the program and it is assigned a new, unconstrained symbolic value. Our 
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2 

3 

4 

5 

6 







x: X, y: Y 
PC: X>Y 


int x, y; 
if (x > y) { 




2 J 




x: X+Y, y: Y 
PC: X>Y 


y = x - y; 




3 l' 


x = x - y; 
if (x > y) 




x: X+Y. y: X 
PC: X>Y 


assert (false) ; 

} 




4 i 


5. 


x: Y, y: X 
PC: X>Y 





x: X, y: Y 
PC: true 



x: X, y: Y 
PC: X<=Y 



: x: Y. y: X 


: x: Y. y: X 


; PC: X>Y & Y>X : 


: PC: X>Y & Y<=X 


FALSE! : 





Fig. 2. Code that swaps two integers and the corresponding symbolic execution tree, 
where transitions are labeled with program control points 



symbolic execution framework automatically infers the loop invariant: -i(a[)] 

0 A i > a. length A 0 < j < a. length) A _l (a[j] j^0Aj<iA0<i,j< a. length). 

Since the symbolic constant j represents some fixed unknown value, this in- 
variant is valid for any value of j. This technique is crucial for checking programs 
that manipulate unbounded data, such as arrays [13]. 

3 Symbolic Execution in Java PathFinder 

In this section we give some background on symbolic execution and we present 
the symbolic execution framework used for reasoning about Java programs. 

3.1 Background: Symbolic Execution 

The main idea behind symbolic execution [24] is to use symbolic values, instead 
of actual data, as input values, and to represent the values of program variables 
as symbolic expressions. As a result, the output values computed by a program 
are expressed as a function of the input symbolic values. 

The state of a symbolically executed program includes the (symbolic) values 
of program variables, a path condition (PC) and a program counter. The path 
condition is a (quantifier- free) boolean formula over the symbolic inputs; it ac- 
cumulates constraints which the inputs must satisfy in order for an execution 
to follow the particular associated path. The program counter defines the next 
statement to be executed. A symbolic execution tree characterizes the execution 
paths followed during the symbolic execution of a program. The nodes represent 
program states and the arcs represent transitions between states. 

Consider the code fragment in Figure 2, which swaps the values of integer 
variables x and y, when x is greater than y. Figure 2 also shows the corresponding 
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symbolic execution tree. Initially, PC is true and x and y have symbolic values 
X and Y, respectively. At each branch point, PC is updated with assumptions 
about the inputs, in order to choose between alternative paths. For example, 
after the execution of the first statement, both then and else alternatives of the 
if statement are possible, and PC is updated accordingly. If the path condition 
becomes false , i.e., there is no set of inputs that satisfy it, this means that the 
symbolic state is not reachable, and symbolic execution does not continue for 
that path. For example, statement (6) is unreachable. 



3.2 Generalized Symbolic Execution 

In [23] we describe an algorithm for generalizing traditional symbolic execution 
to support advanced constructs of modern programming languages, such as Java 
and C++. The algorithm handles dynamically allocated structures (e.g., lists and 
trees), method preconditions (e.g., acyclicity of lists), data (e.g., integers and 
strings) and concurrency. Partial correctness properties are given as assertions 
in the program and temporal specifications. We have since extended the work 
in [23] by adding support for symbolic execution of arrays and for checking 
quantified formulas. 



3.3 Symbolic Execution Framework 

Our symbolic execution framework automates test case generation and allows 
model checking concurrent programs that take inputs from unbounded domains 
with complex structure. To enable a model checker to perform symbolic execu- 
tion, the original program is instrumented by doing a source to source translation 
that adds nondeterminism and support for manipulating formulas that represent 
path conditions. The model checker checks the instrumented program using its 
usual state space exploration techniques — essentially, the model checker ex- 
plores the symbolic execution tree of the program. A state includes a heap con- 
figuration, a path condition on primitive fields, and thread scheduling. Whenever 
a path condition is updated, it is checked for satisfiability using an appropriate 
decision procedure, such as the Omega library [27] for linear integer constraints. 
If the path condition is unsatisfiable, the model checker backtracks. 

Note that performing (forward) symbolic execution on programs with loops 
can explore infinite execution trees. This is why, for systematic state space ex- 
ploration, the framework presented in [23] uses depth first search with iterative 
deepening or breadth first search. The framework can be used for test input 
generation and for finding counterexamples to safety properties. If there is an 
upper bound on the number of times each loop in the program may be executed, 
the framework can also be used for proving correctness, since the corresponding 
symbolic execution tree is finite. 

However, for most programs, no fixed bound on the number of times each loop 
is executed exists and the corresponding execution trees are infinite. In order to 
prove the correctness of such programs, we have extended our framework with 
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void example () { 

IntArrayStructure a = new IntArrayStructureO ; 

Expression i = new IntegerConstant (0) ; 

while (Expression . pc . _update_LT (i , a . length) ) { 
a._set(i,new IntegerConstant (0) ) ; 
i = i._plus(new IntegerConstant (1) ) ; 

} 

assert Expression . pc . _update_EQ (a . _get (new IntegerConstant (0) ) , 0) ; 

> 



Fig. 3. Instrumented code 



the ability of traversing the symbolic execution tree inductively rather than 
explicitly, using loop invariants (as presented in the next section). 

3.4 Java PathFinder 

Our framework uses the Java PathFinder(JPF) [30] model checker to analyze 
the instrumented programs. As a decision procedure, the framework uses a Java 
implementation of the Omega library. 

JPF is an explicit-state model checker for Java programs that is built on top 
of a custom-made Java Virtual Machine (JVM). Since it is built on a JVM, it 
can handle all of the language features of Java, but in addition it also treats 
nondeterministic choice expressed in annotations of the program being analyzed 
annotations are added to the programs through method calls to a special class 
Verify. These features (Verify . choose_boolean() and Verify, choose (n)) 
for adding nondeterminism are used to implement the updating of path condi- 
tions. JPF also supports a program annotation to forces the search to backtrack 
(Verify. ignorelf (condition)) when a certain condition evaluates to true — 
this is used to stop the analysis of infeasible paths (when path conditions are 
found to be unsatisfiable) . 

3.5 Instrumentation 

The interested reader is referred to [23] for a detailed description of how the 
code is instrumented for symbolic execution, here we will instead just highlight 
some key new features. 

The main idea is to replace concrete types with corresponding “symbolic 
types” (i.e. library classes that we provide) and concrete operations with method 
calls that implement “equivalent” operations on symbolic types. As an illustra- 
tion of the instrumentation, consider the code from Figure 1. Figure 3 gives 
part of the resulting code after instrumentation and Figure 4 gives part of the li- 
brary classes that we provide. Classes Expression and IntArrayStructure sup- 
port manipulation of symbolic integers and symbolic integer arrays, respectively. 
The static held Expression. _pc stores the (numeric) path condition. Method 
_update_LT makes a nondeterministic choice (i.e., a call to chooseJboolean) to 
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class Expression { ... 
static PathCondition pc; 
Expression _plus (Expression e){ 
...} > 

class PathCondition { ... 
Constraints c; 

boolean _update_LT (Expression 1, 
Expression r){ 

boolean result; 

result=Verify . choose_boolean() ; 
if (result) 

c . add_constraint_LT(el ,e2) ; 
else 

c . add_constraint_GE(el ,e2) ; 
Verify . ignorelf ( ! c . is_sat () ) ; 
return result; 

} } 



class IntArrayStructure { 

Vector _v; 

Expression length; 

ArrayCell _new_ArrayCell (Expression idx) { 
for(int i=0;i<_v.size() ;i++) { 

ArrayCell cell= (ArrayCell) _v . element At (i) ; 
if (Expression . pc . _update_EQ (cell . idx , idx) ) 
return cell ; 

> 

ArrayCell t=new ArrayCell (length, idx, name ) ; 
_v.add(t) ; 
return t ; 

> 

public Expression _get (Expression idx) { 
assert (Expression. pc . _update_GE(idx, 0)&& 
Expression . pc . _update_LT (idx , length) ) ; 
ArrayCell cell = _new_ArrayCell(idx) ; 
return cell.elem; 

> } 



Fig. 4. Library classes 



add to the path condition the constraint or the negation of the constraint its in- 
vocation expresses and returns the corresponding boolean. Method is_sat uses 
the Omega library to check if the path condition is infeasible (in which case, JPF 
will backtrack). Method _plus constructs a new Expression that represents the 
sum of its input parameters. IntegerConstant is a subclass of Expression and 
wraps concrete integer values. 

To store the input array elements that are created as a result of a lazy 
initialization, we use a variable of class Vector, for each input array. The _get 
and _set methods use the elements in this vector to systematically initialize 
input array elements. When the execution accesses a symbolic array cell, the 
algorithm nondeterministically initializes it to a new cell or to a cell that was 
created during a prior cell initialization. The assertion checks in the _get/_set 
methods establish that there are no array out of bounds errors. 



4 Proving Properties of Java Programs 

In this section we present a Floyd-Hoare style method [14,20,18] for proving light- 
weight properties of Java programs. The method requires loop invariants and we 
present a novel iterative technique for discovering (some of) them automatically. 



4.1 Proving Properties Using Symbolic Execution 

For simplicity of presentation, we illustrate our methodology on a single-loop 
program such as the one in Figure 5 (left); multiple loops can be treated similarly, 
see e.g. [31]. The program consists of some (loop-free) initialization code, a loop 
with condition C and (loop-free) body and post condition P. 

To verify the program, it suffices to find a loop invariant /, i.e. a formula 
that is true when entering the loop, re-entering the loop during its iteration and 
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init ; 

while (C) { 
body; 

> 

assert P; 



1 : init ; 

2: assert I; /* base case */ 

3: symbolic variables in B; 

4: assume I; 

5: if (C) { 

6: B; 

// oldPC 

7: assert I; /* induction step */ 

// PC 

> 

8: else 

9: assert P; 



Fig. 5. Single loop program (left) and instrumented program for proof (right) 



exiting the loop [18]. Moreover, I must be strong enough to produce verifiable 
results (hence a loop invariant true is, in general, not sufficient). In a symbolic 
execution framework, this amounts to checking the three assertions in the modi- 
fied program in Figure 5 (right). Here, we replaced the while statement with an 
if statement; this is equivalent to placing a “cut” in the loop [18]. At this cut 
point, we consider all the variables that are modified in the loop body initialized 
to new symbolic values, and the path condition initialized to true. Note that a 
symbolic execution from this point on is representative of an arbitrary number 
of loop unrollings; the “input variables” at the cut point are the variables that 
are modified by the loop body and their new symbolic values represent all cases. 
Since the program loop has been cut , this symbolic execution will terminate and 
have a finite symbolic execution tree. 

We check for three assertions : 

— the assertion at line (4) is the base case of the inductive argument and checks 
that I holds when entering the loop 

— the assertion at line (7) is the induction step and checks that, assuming I 
holds at the beginning of the loop, I also holds after the execution of the 
loop body (i.e. I is inductive) 

— the assertion at line (9) checks that / is strong enough for the property to 
hold (i.e. / A -> C — > P) 

If there are no assertion violations in the loop- free program of Figure 5 (right), 
then the program of Figure 5 (left) does not violate the property P. With this 
technique, we can verify properties of complex Java programs using the symbolic 
execution framework presented in Section 3. However, the technique requires the 
generation and use of loop invariants. 

4.2 Invariant Generation 

The generation of loop invariants is an intricate problem that often requires a 
deep understanding of how these loops work. 

We propose a novel technique for generating these loop invariants automat- 
ically. The technique works backward, starting from the property to be proved 
and it has three basic ingredients: iterative invariant strengthening, iterative 
approximation and refinement. 
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Iterative Invariant Strengthening. Consider again the example in Figure 5. 
The check for the actual property (i.e. the assertion at line (9)) is used for defining 
the initial candidate invariant; the weakest possible choice is / 0 = -<(-<C A~<P). If 
the base case fails for this candidate invariant, then the program is not verifiable 
(i.e. it has an assertion violation). 

Checking the inductive step generates all the symbolic paths for the loop 
body. If for some of these paths, the invariant is not inductive, then it must 
be replaced by a stronger invariant. Assume PC\, PC 2 , ■ ■ ■ , PC n are the path 
conditions for the paths on which the verification of the induction step fails. 
These path conditions characterize all the “inputs” to the loop body for which 
the check for the inductive step fails. The invariant is strengthened by replacing 
it with Ji = I 0 A —>PCi A -1PC2 A ... A ~>PC n and the base case and the inductive 
step are checked again. 

If applied repeatedly, this process can introduce infinitely many new con- 
straints, hence it can lead to an infinite sequence of exact candidate invariants 1 
7i , 7-2 , . . . We propose to use a simple, but powerful approximation technique to 
help termination. 



Iterative Approximation. At each step k > 0, we apply our approximation 
phase for the current candidate invariant Ik- We should first observe that sym- 
bolically executing the assumption and the body of the loop once (i.e. executing 
lines (4) through (6) in the code of Figure 5 (right)) will generate a finite num- 
ber of symbolic execution paths, that contain a finite number of constraints; we 
call these constraints the universe of constraints Uk at step k. Uk contains the 
constraints from the current invariant together with the constraints generated 
by symbolically executing the loop body. New constraints (that are not in Uk) 
may get generated by the symbolic execution of the assertion at line (7). 

Let PC be a path condition for some path in the loop body after checking 
and discovering a violation for the assertion at line (7), and let oldPC be the 
path condition for the same path in the loop body, before checking the assertion. 
As we said, checking for the assertion itself can potentially add new constraints 
to the path condition (i.e. the set of constraints accumulated in oldPC is a 
subset of the set of constraints in PC). In the approximation phase, instead 
of strengthening the invariant using PC, we use oldPC, which is weaker than 
PC (i.e. PC —> oldPC); this has the effect of obtaining a stronger invariant. 
In other words, our approximation consists of a strengthening step in which we 
drop all the newly generated constraints (e.g. constraints that are present in PC 
but not in oldPC, and hence not in Uk). The approximation phase generates 
a sequence of approximate candidate invariants . .; since there are only 

a finite number of constraints in Uk, this process is guaranteed to terminate, 

1 We distinguish between exact candidate invariants, that are generated during iter- 
ative invariant strengthening and approximate candidate invariants, that are gener- 
ated during iterative approximation. If the base case fails for an exact invariant, then 
the program is not verifiable. But if the base case fails for an approximate invariant, 
this might indicate that the approximation was too coarse so it needs refinement. 
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yielding an inductive invariant I l k , for some l > 0. I k is a boolean combination 
of the constraints contained in U k . 



Refinement. If the base case fails for an approximate invariant, this may be 
because the approximation is too strong. This means that the universe of con- 
straints U k is too coarse for proving the property and it needs to be refined. A 
simple refinement that we use is to consider Uk+i whenever the base case fails 
for an approximate invariant. This amounts to backtracking to the candidate in- 
variant /fc, computing the next exact candidate invariant Ik+i and applying the 
approximation phase at the next iteration. Note that since the set of constraints 
in I k is a subset of the set of constraints in I k + 1 , we have that U k C U k+ 1 , and 
hence U k+ 1 will yield hirer approximation steps. We should also note that if the 
program has an error, it will be eventually caught when the proof of the base 
case will fail for an exact invariant. 



Description of General Verification Method. Now that we have seen the 
basic ingredients, here is how the general method for checking properties works. 
We use the check for the actual property to come up with the initial candidate 
invariant Iq. We then check the base case and the inductive step for this invariant. 

— if both these checks yield no errors, then we are done, the result is that the 
property holds for the program and the current invariant is inductive 

— if the inductive step fails, we apply iterative approximation to get a stronger 
invariant and we go back to checking the base case and the inductive step 

— if the base case fails and the current candidate invariant is exact, then we 
are done, and the result is that the property does not hold for the program; 
if the base case fails and the current candidate invariant is approximate, we 
apply refinement and we check again the base case and the inductive step 

If there is an error in the program, our method is guaranteed to terminate, 
reporting the error. However, if the program is correct with respect to the given 
property, this iterative method might not terminate (and the refinement might 
continue indefinitely) . 

4.3 Illustration 

Consider again our motivating example program from Section 2. The program 
is instrumented to allow symbolic verification and inductive reasoning, as illus- 
trated in Figure 6. Any assertion violation triggers an AssertionError excep- 
tion, which is caught by the program (see lines (3) and (17)-(19) in the instru- 
mented code). Variable oldPC stores the value of the path condition before the 
check of the inductive step; the value of oldPC is used in the approximation 
phase for invariant strengthening. Model checking the program using JPF prints 
all the path conditions PC (together with oldPC) for the assertion violations. 

We first check the assertion at line (15) - which fails. The initial candidate 
invariant is then Jq = _, (o[0] yf 0 A i > a. length). We now instrument this 
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void example () { 

1: IntArrayStructure a = new IntArrayStructureO ; 

2: Expression i = new IntegerConstant (0) ; 

3: try { 

4: assert (I); /* base case */ 

5: i = new Symboliclnteger () ; 

6: Expression j = new Symboliclnteger () ; 

7: Verify . ignorelf (! I) ; /* assume I */ 

8: if (Expression. pc. _update_LT(i, a. length)) { 

9: a._set(i,new IntegerConstant (0) ) ; 

10: i = i._plus(new IntegerConstant (1) ) ; 

11: . . . // oldPC = PC; 

12: assert (I); /* induction step */ 

13: > 

14: else 

15: assert Expression. pc . _update_EQ (a. _get (new IntegerConstant (0)) ,0) ; 

16: // assert Expression. pc. _update_EQ(a._get(j) ,0) ; 

17: } catch (AssertionError e) {. 

18: ... // print oldPC; 

19: . . . // print PC; 

> } 



Fig. 6. Motivating example - verification (excerpts) 



formula to enable symbolic execution and add it at lines (4), (7) and (12), then 
we model check the program and we find a counterexample for the following 
path condition(s): 

PC = (* + 1) > a. length A i > 0 A a[0] ^ 0; 

oldPC =i > 0 Aa[0] ± 0. 

At this point we use iterative approximation, and we use oldPC for strength- 
ening the invariant (i.e. we drop the newly generated constraint (z+1) > a. length 
from PC), yielding the new candidate invariant: Iq = 1$ A -i(i > 0 A o[0] ^ 0). 
This invariant suffices to prove the property. 

In order to check the additional assertion Vj : a[j\ = 0, we declare a new 
symbolic variable j (at line (6)) and we check for the assertion at line (16), 
that is instrumented for symbolic execution. The initial candidate invariant is 
Jo = ->(a[j] / 0 A i > a. length A 0 < j < a. length). Model checking the program 
using this additional invariant gives a counterexample for the following path 
condition(s): 

PC = (i + 1) > a. length A a[j] / 0 A j < i A 0 < i, j < a. length-, 

oldPC = a[j\ =£0Aj<iA0<i,j< a. length. 

Using oldPC for strengthening the invariant, we get Iq = IoA~<(a[j] ^ OAj < 
i A 0 < i, j < a. length which suffices to prove the property. 



4.4 Discussion 

We have presented a method that extends the framework presented in [23] with 
the ability of proving partial-correctness specifications. This yields a flexible 
framework for checking Java programs. The general methodology for using our 
framework is to first use it as a model checker, using depth first search with 
iterative deepening or breadth first search. 




Verification of Java Programs Using Symbolic Execution 175 



If no errors are found up to a certain depth, then there is some confidence 
that the program is correct (with respect to the given property), and a proof of 
correctness can be attempted using the method presented in this section. If an 
error is still present after the model checking phase, it will be found as a base 
case violation for an exact candidate invariant. 

Our approximation consists of dropping newly generated constraints; a poten- 
tially more powerful, but more expensive, approximation would be that instead of 
dropping constraints, to replace them with an appropriate boolean combination 
of existing constraints from Uk ■ This has some similarities with the predicate ab- 
straction techniques and we would like to investigate this further. Our technique 
can also be used in conjunction with other, more powerful methods [9,8,7,32]. 

Our current system is not fully automated; although we discover all path 
conditions that lead to an assertion violation automatically, we combine the 
conditions by hand into a candidate invariant and add it back to the code to check 
if it is inductive. An implementation of these features is currently underway. 

Traditionally, invariant generation has been performed using iterative for- 
ward and backward traversal, using different heuristics for terminating the it- 
eration; e.g. convergence can be accelerated by using auxiliary invariants (i.e. 
already proved invariants or structural invariants obtained by static analy- 
sis) [25,31,3,29,4,16,19]. Abstract interpretation introduced the widening op- 
erator, which was used to compute fixpoints systematically [8,7,9]. Alternative 
methods [5] use constrained based techniques for numeric invariant generation. 

Most of these methods use techniques that are domain specific. Our method 
for invariant generation uniformly treats different kinds of constraints. Our 
method could be viewed as an iterative-deepening search of a sufficient set of 
constraints that could express an invariant that is strong enough for verifying 
the property. Each step in this search is guaranteed to terminate, but deepening 
(refinement) may be non terminating. 

5 Experiments 

This section shows the application of our framework to the verification of several 
non-trivial Java programs. We compare our work with the invariant generation 
method presented in [13]. We also show an example for which our method is not 
able to infer a loop invariant, in which case it can benefit from more powerful 
approximation techniques. 

Method find. Figure 7 shows an example adapted from [13]. Method find 
takes as parameters an array of integers a and an array of booleans b. The 
method returns the index of the first non-zero element of a if one exists and 
a. length otherwise. The method also sets the i-tlr element of b to true if the 
i-tlr element of a is nonzero, and to false otherwise. The preconditions of the 
method state that the arrays are not null and of the same length. The assertion 
states that the index to be returned (spot) is either a. length or b is true at 
that index. 
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// @ precondition: a != null && b != null && a. length == b. length; 
int find (int [] a, boolean [] b) { 

1: int spot = a. length; 

2: for (int i = 0; i < a. length; i++) { 

3: if (spot == a. length && a[i] != 0) 

4: spot = i; 

5: b [i] = (a[i] != 0) ; 

> 

6: assert (spot == a. length || b[spot]); 

7 : return spot ; 

> 



Fig. 7. Method find 



To check that there are no assertion and array bounds violations, our frame- 
work infers the following invariant (fc = 0, two approximation steps): 

-<(i < 0) A -i (i > a. length A 0 < spot < a. length A ->b[spot]) A 
-i'(0 < i < a. length A spot = i A a[i] = 0)A 
->(0 < i < a. length A 0 < spot < i A ~^b[spot]) A 

->(0 < i < a. length A i < spot < a. length). 

This invariant is sufficient to prove the property. As in [13] we checked an 
additional assertion, which states that, at the end of the method execution, every 
element of b before spot contains false: V int j : 0 < j < spot -A ~>b\j]. 

To prove that this assertion holds, our framework generates the following 
additional invariant: 

-i(i > a. length A 0 < j < spot A spot < a. length A b[j]) A 

-i(0 < i < a. length A 0 < j < i A spot = a. length A b[j]) A 

->(0 < i < a. length A 0 < j < spot A spot = i A b[j ] A a[i\ yf 0)A 

->(0 < i < a. length A 0 < j < spot A spot < i A b[j] A b[spot]). 

The method presented in [13] starts with a set of “interesting” predicates 
provided by the user and performs iterative forward abstract computations to 
compute a loop invariant as a combination of these predicates. For proving the 
first assertion in the example above, the method requires three predicates: spot = 
a. length, b[spot] and spot < i, while for proving the second assertion, the method 
requires four additional predicates: 0 < j, j < i, j < spot and b[j]. 

In contrast, our method does not require any user supplied predicates, al- 
though we should note that some of these predicates can be generated by several 
heuristic methods that are also described in [13]. We should also note that the 
invariants in [13] are more concise, as they are given in disjunctive normal form. 
Unlike [13] , our method works backward starting from the property to be checked 
and it naturally discovers the necessary constraints over the program’s variables, 
through symbolic execution and refinement. An interesting future research di- 
rection is to use the method presented in [13] in conjunction with ours: at each 
step k, instead of using approximation we could use the predicate abstraction 
based method, starting from the set of constraints Uk- 

List Partition. Figure 8 (left) shows a list partitioning example adapted again 
from [13]. Each list element is an instance of the class Node, and contains two 
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Node partition (Node 1, int v) { 
1 : Node curr = 1 ; 

2: Node prev = null; 

3: Node newl = null; 

4: Node nextCurr; 

5: while (curr != null) { 

6: nextCurr = curr. next; 

7: if (curr.elem > v) { 

8: if (prev != null) 

9: prev. next = nextCurr; 

10: if (curr == 1) 

11: 1 = nextCurr; 

12: curr. next = newl; 

13: assert curr != prev 

14: newl = curr; 

> 

15: else { 

16: prev = curr; 

> 

17: curr = nextCurr; 

} 

18: return newl; } 



void m(int n) { 

1 : int x = 0 ; 

2: int y = 0; 

3: while (x < n) {. 

4 : x++ ; 

5 : y++ ; 

> 

6: /* hint: x == y; */ 

7: while (x != 0) { 

8: x—; 

9: y--; 

1 

10: assert (y == 0); 

> 



Fig. 8. Method partition (left) and another example (right) 



fields: an integer elem and a reference next to the following node in the list. 
The method partition takes two arguments, a list 1 and an integer value v. It 
removes every node with value greater than v from 1 and returns a list containing 
all those nodes. The assertion states that curr is not aliased with prev. Our 
framework checks that there are no assertion violations and it generates the 
following sequence of candidate invariants. 

Io = ->(curr= prev A curr^= null A curr.elem > v). 

I® — I o A -i( curr ^ prev A curr ^ null A prev^= null A curr.elem > v). 

Approximate invariant I) is too strong (Jq leads to a base case violation). 
The framework then backtracks and continues with the next exact invariant: 

I\ = Iq A ->( curr ^ prev A curr ^ null A prev ^ null A curr.elem > v A 
prev. elem > v A prev = curr. next). 

l\ = 1 1 A -i {curr ^ prev A curr ^ null A prev ^ null A curr.elem > v A 
prev. elem > v A prev ^ curr. next). 

Approximate invariant l\ is inductive. This example has shown that our 
framework can handle constraints on structured data. We also successfully ap- 
plied our framework to the examples presented in [11], where we checked the 
absence of null pointer dereferences. 



Pathological Example. The iterative method for invariant generation pre- 
sented in Section 4 might not terminate. For example, consider the code in 
Figure 8 (right) 2 . 

As our method works backward from the property, we first attempt to com- 
pute a loop invariant for the second loop. Our iterative refinement will not 

2 Note that several other methods, such as the predicate abstraction with refinement 
as implemented in the SLAM tool [1] would also not terminate on this example. 
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terminate for this loop. Considering increasing the number of exact strengthen- 
ing steps does not help. Intuitively, the method does not converge because the 
constraint x = y (and its negation) is “important” for achieving termination, 
but this constraint does not get discovered by repeated symbolic executions of 
the code in the loop body. 

The programmer can provide additional helpful constraints by hand in the 
form of “hints”, to boost the precision of the iterative approximation method. 
For example, the hint at line (6) in the code of Figure 8 (right) has the effect 
of nondeterministically adding the constraint (and its negation) to the current 
path condition, and hence these constraints are also added to the universe of 
constraints at each strengthening step. With this hint, we get the following loop 
invariant for the second loop ( k = 0, two approximation steps): 

->(y ^ 0 A x = 0) A — 1(3/ < 0 A x > 0) A ->(y > 0 A x ^ y). 

Using this invariant as the postcondition for the first loop, we then get the 
following loop invariant for the first loop, which suffices to prove the property: 
-i(x > n A x ^ y) A ~>{x < 0) A ->(x >0/\x<n/\x^y). 

We should note that more powerful techniques such as linear equalities ab- 
stract domain [22] would work for this example. We would like to use our frame- 
work in conjunction with more powerful abstraction techniques (such as [22]) 
or with alternative dynamic methods for discovering loop invariants (e.g. the 
Daikon tool [12] could be used to provide useful “hints”). 

6 Related Work 

Throughout the paper, we have discussed related work on invariant generation. 
Here we link our approach to software verification tools. King [24] developed 
EFFIGY, a system for symbolic execution of programs with a fixed number of 
integer variables. EFFIGY supported various program analyses (such as asser- 
tion based correctness checking) and is one of the earliest systems of its kind. 

Several projects aim at developing static analyses for verifying program prop- 
erties. The Extended Static Checker (ESC) [10] uses a theorem prover to verify 
partial correctness of classes annotated with JML specifications. ESC has been 
used to verify absence of such errors as null pointer dereferences, array bounds 
violations, and division by zero. However, tools like ESC rely heavily on speci- 
fications provided by the user and they could benefit from invariant generation 
techniques such as ours. 

The Tlrree-Valued-Logic Analyzer (TVLA) [28] is a static analysis system 
for verifying rich structural properties, such as preservation of a list structure in 
programs that perform list reversals via destructive updating of the input list. 
TVLA performs fixed point computations on shape graphs, which represent heap 
cells by shape nodes and sets of indistinguishable runtime locations by summary 
nodes. Our approximation technique has similarities to widening operations used 
in static analysis. We would like to explore this connection further. 

The pointer assertion logic engine (PALE) [26] can verify a large class of data 
structures that can be represented by a spanning tree backbone, with possibly 
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additional pointers that do not add extra information. These data structures 
include doubly linked lists, trees with parent pointers, and threaded trees. Shape 
analyses, such as TVLA and PALE, typically do not verify properties of programs 
that perform operations on numeric data values. 

There has been a lot of recent interest in applying model checking to software. 
Java PathFinder [30] and VeriSoft [15] operate directly on a Java, respectively 
C program. Other projects, such as Bandera [6], translate Java programs into 
the input language of verification tools. Our work would extend such tools with 
the ability to prove partial-correctness specifications. The Composite Symbolic 
Library [33] uses symbolic forward fixed point operations to compute the reach- 
able states of a program. It uses widening to help termination but can analyze 
programs that manipulate lists with only a fixed number of integer fields and it 
can only deal with closed systems. 

The SLAM tool [1] focuses on checking sequential C code with static data, 
using well-engineered predicate abstraction and abstraction refinement tools. It 
does not handle dynamically allocated data structures. Symbolic execution is 
used to map abstract counterexamples on concrete executions and to refine the 
abstraction, by adding new predicates discovered during symbolic execution. 
We should note that tools like SLAM perform abstraction on each program 
statement, whereas our method performs approximation (which can be seen 
as a form of abstraction) only when necessary, at loop headers. This indicates 
that our method is potentially cheaper in terms of the number of predicates (i.e. 
constraints) required. Of course, further experimentation is necessary to support 
this claim. There are many similarities between predicate abstraction and our 
iterative approximation method and we would like to compare the two methods 
in terms of relative completeness (as in [2,9]). 



7 Conclusion 

We presented a novel framework based on symbolic execution for the verification 
of software. The framework uses annotations in the form of method specifica- 
tions and loop invariants. We presented a novel iterative technique for discover- 
ing these loop invariants automatically. The technique works backward from the 
property to be checked and it systematically applies approximation to achieve 
termination. The technique handles uniformly both numeric constraints and con- 
straints on structured data and it allows for checking universally quantified for- 
mulas. We illustrated the applicability of our framework to the verification of 
several non-trivial Java programs. Although we made our presentation in the 
context of Java programs, JPF, and the Omega library, our framework can be 
instantiated with other languages, model checkers and decision procedures. In 
the future, we plan to investigate the application of widening and other more 
powerful abstraction techniques in conjunction with our method for invariant 
generation. We also plan to extend our framework to handle multithreading 
and richer properties. We would also like to integrate different (semi) decision 
procedures and constraint solvers that will allow us to handle floats and non- 
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linear constraints. We believe that our framework presents a promising flexible 

approach for the analysis of software. How well it scales to real applications 

remains to be seen. 
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Abstract. The model checking of counters systems often reduces to the 
effective computation of the set of predecessors Pre*(A' 7 ) of a Presburger- 
definable set X' . Because there often exists an integer k > 0 such that 
Pre- fc (A'') = Pre*(A' / ) we will first look for an efficient algorithm to 
compute the set Pre(A) in function of A. In general, such a computation 
is exponential in the size of X. In [BB03], the computation is proved to 
be polynomial for a restrictive class of counters systems. In this article 
we show that for any counters systems, the computation is polynomial. 
Then we show that the computation of Pre- fc (A') is polynomial in k 
(and exponential in the dimension m) for effective counters systems with 
interval-definable sets. 



1 Introduction 

Model checking infinite-state transition systems often reduces to the effective 
computation of the potentially infinite set of predecessors Pre*. More precisely, 
the safety model checking can be expressed as the following problem. 

Given as inputs: an infinite-state transition system, a set So of initial states, 
and a set Sbad of non-safe states; 
the question is So fl Pre* (Shad) = 0 ? 

To succeed in computing the limit Pre*(Sf, a( j) of the infinite non-decreasing 
sequence (Pre-* (Sf, a d))ii we need three properties: 

— The convergence of the sequence 3k; Pr e*(Sb a d) = Pr e~ (Sbad), and 

— An efficient algorithm for computing Pre(X) from X, 

— An efficient algorithm for computing (Pre-* (A)) from X, 

The convergence of (Pr e~ l (Sbad))i with Sbad an upward closed is insured 
for Well Structured Transition Systems (WSTS) [FS01], but even for simple 
WSTS (for instance, lossy channel systems), the index k such that Pre*(Sf, a d) = 
Pr e- k (Sbad) may be very large [Sch02]. However, even if the convergence is 
not guaranteed by the theory (for instance when the set of initial states is not 
upward-closed), in practice we observe that often, the sequence (Pre-* (So)), con- 
verge [DelOO] [BB03] and it often converges quickly ([Bra] [Bab]). This explains 
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that we will focus our attention on the second problem, to obtain an efficient 
algorithm for computing Pre(X) from X and then, on the third problem for 
efficiently computing (Pre- (X)). We would like to understand what are the 
conditions insuring an efficient computation of the sequence (Pr e- l (Sbad))i and 
then, may be, to better understand the good performances of some recent tools: 
Brain, Babylon. 

We first have to fix the model of our infinite-state transition systems, the 
type of infinite sets and the way to represent these sets. 

Infinite-state transition systems. We will focus on programs with integer 
variables and more precisely on counters systems in which a transition is defined 
by a Presburger function relating the values of the counters before and after 
the transition is fired. This model is very general (our model restricts neither 
the number of counters nor the upward closed guards [FMP99]) and powerful 
([FL02], [Ler03a]) so the price to pay is the undecidability of reachability 
properties. Generally, the transition relation will be effectively given by a 
saturated digit automaton [Ler03a]. 

Representations of Presburger-definable sets. In order to effectively com- 
pute Pre(X'), one generally needs to find a class of infinite sets which has the 
following properties: (1) closure under union, (2) closure under Pre, (3) mem- 
bership and inclusion are decidable with a good complexity, and (4) there exists 
a canonical representation. 

Semi-linear basis/periods and Presburger formulas are not canonical repre- 
sentations of Presburger-definable sets (which are also semi-linear sets). Recall 
that Number Decision Diagrams (NDD) ([Boi98], [WBOO], [BC96]), provide a 
m-digit by m-digit representation of vectors in N m whereas Saturated Digit 
Automata (SDA) use a digit-by-digit representation of vectors in N m . We prove 
that NDD and SDA have the same expressiveness but the theory of SDA enjoys 
an useful characterization of the minimal SDA associated with a set X. 

Our computation problems. Now we may precise the input of our two com- 
putation problems. 

1. The first problem is to compute Pres(X'): 

Input: a counters system S and a saturated digit automaton A! that 
represents a set X' C N m . 

Question: Can we compute in polynomial time in the size of A', a saturated 
digit automaton A representing X = Pres(X') ? 

2. The second problem is to compute the k th element Pre)p(X): 

Input: a counters system S, a saturated digit automaton A' that represents 
a set X ' C and an integer k. 

Question: Can we compute in polynomial time in k, a saturated digit au- 
tomaton A representing X = Pre^ (X 7 ) ? 

Related works. We use the approach called the regular model checking : for 
channel systems, Semi-Linear Regular Expressions [FPSOO] and Constrained 
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Queue-content Decision Diagrams [BH99] have been proposed; for lossy chan- 
nel systems [ABJ98], the tools Lcs (in the more general tool TreX [ABS01] 
[Tre]) uses the downward-closed regular languages and the corresponding subset 
of Simple Regular Expressions for sets and it represents them by finite automata 
to compute Post*; for stack automata, regular expressions or finite automata are 
sufficient to represent Pre* and Post* [BEF+00]; for Petri nets and parameterized 
rings, [F097] uses regular languages and Presburger arithmetics (and acceler- 
ation) for sets. For Transfer and Reset Petri nets [DFS98], the tool Babylon 
[Bab] utilizes the upward closed sets and represents them by Covering Sharing 
Trees [DRV01], a variant of BDD; for counters automata, the tool Brain [Bra] 
uses linear sets and represent them by their linear bases and periods; for extended 
counters automata, the tools Fast [Fas], [FL02], [BFLP03] and Lash [Las] uti- 
lize semi-linear sets and represents them by NDD, moreover, these two tools are 
able to accelerate loops [Boi] [FL02] ; Mona [Mon] [KMS02] and FMona [BFOO] 
use formula in WS1S to represent sets; the tool CSL-ALV [BGP97] [Alv] uses 
linear arithmetic constraints for sets and manipulates formula with the Omega 
solver and the automata library of Lash. 

In [BB03], the computation of Pres (X') is proved to be polynomial for a complex 
subclass of counters systems. 

Our results. 

1. We introduce SDA as a canonical representation of set of vectors of integers. 
Even if SDA have the same expression power than NDD, there exists an 
elegant theoretical characterization of the minimal SDA associated with a 
set X which is useful for computing the size of the minimal SDA. 

2. We show that for counters systems S, the set of immediate predecessors 
Pres(X') is computable in polynomial time in the size of the SDA that 
represents X' . This result generalizes a recent result of [BB03]. 

3. We characterize the affine functions whose the inverse image of any interval- 
definable set remains interval-definable. Then we prove that the asymptotic 
size of the minimal SDA that represents Pre^ fc (A') is polynomial in k and 
exponential in m. 

Plan of the paper. Saturated Digit Automata are introduced in section 3 and 
compared with NDD. In the next section 4, we define counters systems and 
prove that the computation of Pre(A') is polynomial in X' . In the last section 
5, the asymptotic size of the minimal SDA representing Pre]| (X') is proved to 
be polynomial in k. 

2 Preliminaries 

The cardinal of a finite set X is written card(X). 

The set of rationals, integers and positive integers are respectively written Q, 
Z and N. The set of vectors with m components in a set X is written X m . The 
i-tli component of a vector x £ X m is written Xi £ X; we have x = {x \, . . . , x m ). 
For any vector v, v' £ Q m and for any t £ Q, we define t.v and v + v' in Q m by 
(t.v)i = t.Vi and (v + v')i = Vi + v[. The vector e; £ N m is defined as (eQj = 1 
if j = i and (e,)j = 0 otherwise. 
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The set of square matrices of size m in Q is written M m (Q). A function 
f : D —> N m with D C N m is affine if there exists a square matrix M £ M m (Q) 
and a vector v £ Q m such that f(x) = M.x + v for every x £ D. Remark 
that rational matrices are needed for representing some affine functions like 
/ : 2.N -)> N defined by f{x) = §. 

The set of words over a finite alphabet E is written E* . The concatenation 
of two words cr and a' in E* is written era 1 . The empty word in E* is written e. 

A finite automaton A is a tuple A = (Q, E, A, Qo, F)\ Q is the finite set of 
states, E is the finite alphabet, A C Q x E x Q is the transition relation, Qq C Q 
is the set of initial states and F C Q is the set of final states. The size of a finite 
automaton A is \A\ = card(Q). A finite automaton A is said deterministic if 
the set Qo is reduced to one element Q o = {go} and if there exists a function S 
defined over a subset of Q x E into Q such that A = {(g, S(q, a)); q £ Q; a £ E}. 
A deterministic automaton is said complete if the function S is defined over 
the whole set Q x E. A path P in a finite automaton A from a state g to a 
state q' is a finite sequence g = g 0 , (go, ai, gi), gi, . . . , (g n _i, a n , g„), q n = q' with 
n > 0 such that (g*^, a*, g^) is a transition in A. The label of P is the word 
a = a-\ . . . a n £ E*. Such a path is also written g A- g'. The state q' is said 
reachable from g and g is said co-reaclrable from q' . The language accepted by a 
finite automaton A is £(A) = {a £ E*; 3go £ Qo; 3q/ £ F; qo g/}. 

Let us recall the two considered logics: 

— The Presburger logic ([Ber77]) is built with the following formulas: 

4> := Ci.Vi = c\3v (f > |Vv cj)\(j> V (f>\(f> A (j>\->(j>\true\f alse 
i£l 

where (cj) is a finite sequence of N, c £ N and (vi) v are in a finite set 
V of variables. 

— The interval logic ( [Str98] (a.k.a simple constraint [AABOO]) is defined by 
the following formulas: 

(j> := Vi = c\<j) V (f>\(f> A <j)\-i<j)\true\false 

A set X C N m is said Presburger-definable (resp. interval-definable) if it can 
be defined by a Presburger formula (resp. by a formula in the interval logic). 



3 Saturated Digit Automata 

Recall that there exist two natural ways in order to associate to a word a a 
vector in N m following that the first letter of a is considered as an “high bit” or 
a “low bit”. In this article, we consider the “low bit” representation (even if the 
other one, just seems to be symmetrical, results proved in the paper cannot be 
easily extended to the other one). 

Let us consider an integer r > 2 called the basis of decomposition and an 
integer to > 1 called the dimension of the represented vectors. A digit b is an 
element of the finite alphabet E r = {0, . . . , r — 1}. In general ([Boi98] [WB00], 
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[BC96]), a vector in N m is only associated to words of digits whose the length 
is multiple of to. However, as shown in this article, an extension to any word of 
any length can be useful. 

Like in [Ler03a,Ler03b], function 7 ^ : N m —> N m is defined by the induction 
=7a°7a and 7b{(xi,. • • , x m )) = (r.x m + b,x 1 , . . . , X m - 1 ) for any digit b. 
Let us remark that if to divides the length of a = (& 1 . 1 . . . 6 ll7l ) . . . (b n i . . . 
then the following equality holds: 



7<t((0, ■ • ■ ,0)) = '(b iAx ..,. ,bi,m) 

2 = 1 

Naturally, the vector p m (c r) = 7<r((0, . . . , 0)) is called the vector associated to a. 
Thanks to the function p m : E* — > N m , we can now define the Saturated Digit 
Automata and the Number Decision Diagrams. 

Definition 1. A Saturated Digit Automaton (SDA) A that represents a set 
X C N m is a deterministic and complete automaton over E r such that L(A) = 
pfff(X). Such a set X is called SDA-definable. 

Definition 2. A Number Decision Diagram (NDD) A ([Boi98] [WBOO]) that 
represents a set X C is a deterministic and complete automaton over E r 
such that L(A) = /9“ 1 (X) fl (Ef 1 )*. 



Remark 1. NDD also allow to represent vectors in Z m with “high” or “low” bit 
first representation. Whereas the results proved in this article can be extended 
to Z m , an extension to “high” bit first representation seems difficult. 

The following proposition shows that SDA and NDD represent the same sets 
of N m . 

Proposition 1. — From any NDD A , we can effectively compute in time 

0(r.\A\) a SDA A' that represents the same subset, such that \A'\ < \A\. 

— From any SDA A, we can effectively compute in time 0(r.m.\A\) an NDD 
A' that represents the same subset, such that \A'\ < to.|A| 

Proof. (Sketch). Let us consider a NDD A that represents a set X. By replacing 

the set of final states F of A by the set F' = {q € Q ; 3qp € F; q qp}, we 
deduce a SDA A! that represents X. 

Now, let us consider a SDA A that represents a set X. As L(A) = p _ 1 (Ai), the 
“synchronized product” of A and the automaton with to states that recognizes 
the language (Elf 1 )* provides a NDD A' that also represents X. 



Remark 2. As any Presburger-definable set can be effectively represented by a 
NDD [WBOO], the same result holds for SDA. 

We have introduced the class of SDA rather than using the NDD because 
the minimal SDA that represents a set X is given by the “residues” of X. 
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Definition 3. The set a 1 .X = 7 cr 1 (X) is called the residue of X C N m by 
cr £ Ef. 

From 7 ai .a 2 = 7>ti °7<t 2 > we deduce the equality cr^.af 1 ^ = (cri.o ^)” 1 X 
that enables us to give the following definition. 

Definition 4. Let X C N m be such that its set of residues Q(X) = {cr _ 1 .X; a £ 
Ef} is finite. The deterministic and complete automaton A(X) is defined by: 

(A(X) = (Q(X),E r ,5,q 0 ,F) 

I S(q,b) = b~ l .q 
Uo = X 

={«£<?(*); ( 0 , . . . , 0 ) £ </} 



Lemma 1. For any X C N m and a £ E*, we have a 1 .p 1 (- 5 f) = p 1 (cr 1 .X). 

Proof. We have w £ a~ 1 .p~ 1 (X) iff a.w £ p~ 1 {X) iff p(a.w) £ X iff 7 a (p(w)) £ 
X iff p(w) £ a~ x .x iff w £ p~ 1 (a~ 1 .X). □ 

The following theorem is really important because it proves that the structure 
of the minimal SDA that represents a set X can be obtained just by studying 
the set of residues of X. 

Theorem 1. A set X C N m is SDA-definable if and only if its set of residues is 
finite. Moreover, in this case, A(X) is the unique minimal SDA that represents 
X. 

Proof. Assume that Q(A) is a finite set. We are going to show that A(X) 
is a SDA that represents X by proving that L(A(A)) = p~ 1 (X). We have 
a £ L(A(X)) iff (0,... ,0) £ cr _1 .X = 7 “ 1 (A). Therefore a £ £(A(X)) iff 
p(cr) = 7<t((0, . . . , 0)) £ X. Hence, we have proved that £(A(X)) = p~ 1 {X). In 
particular p(C(A(A))) = X and p~ 1 (p(L(A(X)))) = C(A(A)). We have proved 
that A(X) is a SDA that represents X. 

Now, assume that X is SDA-definable and let us prove that Q(X) is finite. 
The language L = p - 1 (AT) is regular. As the minimal deterministic and complete 
automaton that recognizes L is unique, there exists a unique minimal SDA that 
represents X. Recall that the set of states of this minimal automaton is given 
by {cr” 1 .!!}. From lemma 1, we deduce that Q(X) = {^(cr” 1 .!!)}. Therefore, 
Q(X) is finite and by uniqueness of the minimal automaton, A(X) is the unique 
minimal SDA that represents X. □ 



Remark 3. Find a theorem equivalent to the previous one for the class of NDD 
seems difficult. 
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4 Polynomial Time Computation of Pres(X') 

For counters systems S, the computation of the set of immediate predecessors 
Pres(A') for the SDA representation, is proved to be polynomial in time. 



Definition 5. A saturated digit automaton A represents a function f : N m — > 
N m if it represents the following set ofN 2m : 

{(xi,x[,... ,x m ,x' m ); (x[ ,.. . ,x' m ) = f((x i,... ,x m ))} 

Naturally, a function / is said SDA-definable if there exists a saturated digit 
automaton that represents /. 

Remark 4- The previous definition can be extended to binary relation. 

Definition 6. A counters system S is a tuple S = (N m , E, (fa)ae.n) where E is 
a finite set of actions and f a : N m — ► N m is a SDA-definable function. 

Remark 5. In practice, the function f a is given by Presburger formula. However, 
remark 2 shows that any Presburger definable set is SDA-definable. 

The set of immediate predecessors of X ’ C is naturally defined by 
Pve s (X') = {J a&l: ff\X'). 

Remark 6. Any counter automaton can be “simulated” by a counter system just 
by added another counter bounded by the number of control states. 

The size |Sj of an effective counters system S represented by a sequence of 
SDA (A a ) aes is|S| = £„ 6 * \A a \. 

Theorem 2. Let g : N m —> N m and X' C be represented respectively by the 
SDA A f and by the SDA A! . The set g~ 1 {X') can be effectively represented by 
a SDA in time 0(r.{\A'\ + 1) l-^ 9 1 ) . 

Proof. Let us denote by Q 9 ± the set of states q 9 £ Q 9 such that there does 
not exist a path from q 9 to a final state. Symmetrically, we define Q' ± . Let 
K = ( Q 9 ± x Q') U ( Q 9 x Q '_ L ). We are going to prove that the following automaton 
A = (Q, E r , 6, {go}, F) is a SDA that represents g~ 1 (X): 

'Q = T{Q' x Q3) 

5(q,b) = {(S'(q',b'),S 9 (q 9 ,bb'))\ (q\q 9 ) G q\ V £ E r }\K 
' qo = {{qiq 9 0 )}\i< 

F = {qf £ Q; 3q £ Q\ q D (F' x F 9 ) 0; q f q f } 

Let us prove that the number of reachable states of A is bounded by (|Q , | + 1)I < ^ 9 L 
Let q be a reachable state of A and let us prove that for any (q[, q 9 ) and {q^q 9 ) 
in g, we have q[ = q' 2 ■ There exists a sequence b\ , . . . , b n in E r such that g = 
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S(qo, bi . . . b n ). By definition of A, there exists two sequences b[ Vi . . . b' nl and 
b\ 2 i ■ ■ • , b' n 2 in S r such that 

(q g = S g (q^,b 1 b' 11 ...b n b' nl ) 

I Q 9 = S g (q^bib' h2 . ..b n b' n2 ) 

I Ql = 6’ Wo, fry 1 • • • frn.l) 

l?2 = ^1,2 ' ' • Ki, 2 ) 

As (< 7 9 ,<Zi) is not in A", we have q g fL Q 9 ± . So, there exists a word cr £ £* 
and a final state q 9 £ F 9 such that q 9 A- q 9 is an accepting path in A 9 . As 
A 9 is a SDA, by replacing a by cr.O, we can assume that \a\ is even. We have 
a = b n+ ib' n+l . . . bkb' k where bi, b\ £ £ r . Let x\ x' 2 and x be the vectors in N m 
defined by: 

{ x '\ — Pm{bi i ■ ■ ■ b' n l b' n+ 1 . . . b' k ) 
x '2 — Pm{b' 12 . . . b' n 2 b' n+ 1 . . . b' k ) 

x = p m (bi ■ ■ ■ b k ) 

As biV lfl . . . b n b' n l b n+1 b' n+1 . . . b k b' k and bib’ 12 . . . b n b' n 2 b n+1 b' n+1 . . . b k b' k are two 
accepted words in C(A 9 ), we have x\ = g(x) = x 2 . As b' l l ...b' nl b' n+1 ...b' k 
and b\ 2 . . . b' n 2 b' n+l . . . b' k are two words with the same length that represent the 
same vector x' x = x 2 , we have b’n . . . b' nl b' n+1 . . . b' k = b[ 2 . . . b' n 2 b' n+1 . . . b' k . In 
particular, we have proved that q[ = 5' (q' 0 , b[ 1 . . . b' n l ) = S' (q' 0 , b[ 2 . . . b' n 2 ) = q 2 . 
Therefore, the number of reachable states of A is bounded by (\Q'\ + 1)^ 8 L 
Now, let us prove that C(A) C p^ L (g~ 1 (X)). Consider an accepting path 

q ^ qj in A. There exists a path qj q such that q fl (F' x F 9 ) ^ 0. Let us 
decompose the word w. 0* as a sequence of digits w.0 1 = b\ . . . .b n . There exists 
a word b\ . . . b' n such that bib[ . . . b n b' n £ H(A 9 ) and such that b[ . . . b' n £ L(A'). 
Let x = pm{b\ ■ ■ ■ bm) and x' = p m (b'i ■ ■ ■ b' m ). Remark that p 2 .m(bib' 1 . . . b n b' n ) = 
, x m , x' m ). Therefore x' = g{x). Moreover, from b[ . . .b' n £ L(A'), we 
deduce x' £ X' . So x £ g~ 1 (X). As p m (w) = p m (w.0 l ) = p m (bi ■ ■ ■ b n ), we have 
proved w £ p^{g~ 1 (X)). 

Finally, let us prove that p~^{g^ 1 {X)) C L(A). Consider w £ p~ 1 (g^ 1 (X')) 
and let x = p m (w ) and x' = g(x). There exists a word a £ L(A 9 ) such that 
P 2 .m(u) = (xi,x'i , . . . , x m , x' m ). As A 9 is a SDA, we can replace a by cr.O-' for any 
j > 0. In particular, \a\ can be assumed even and greater than 2.|tn|. Let us write 
cr = bib[ . . . b n b' n such that bi, b\ £ £ r and remark that x = p m (bi . . . b n ) and 

x' = Pm{b'i ■ ■ ■ b' n ) . Hence q 0 b - 1 "' b - n > q is a path in A such that gfl( F' x F 9 ) ^ 0. As 
Pm{b\ . . . b n ) = p m (w) and |w| < n, there exists i > 0 such that 61 , . . b n = w. 0*. 
By definition of F, we have w £ A (A). □ 

Corollary 1. Let S be a counters system. The minimal SDA A(Pres(A / )) is 
computable in polynomial time in function of A{X'). 

Proof. Just remark that Pres(A') = U a ez: fa 1 ^')- By using an Hopcroft al- 
gorithm [Hop71], we can compute the minimal SDA A(Pres(A / )) from a SDA 
A that represents Preg(A') in time 0(|A|. ln(|A|)). □ 
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The previous corollary shows that A(Pre 5 (Ai / )) can be computed in polyno- 
mial time in the size of A(X') for counters system S. Remark that the complexity 
is also exponential in |Sj. However, in the computation of Pre;| (A'), the size 
| S'! does not depend on k. Moreover, in practice, the size of S is small compared 
to the size of A{X’). 

Remark 1. In the case of the computation of the immediate successors 
Postg(X) = lj a E f a (X), the number of states of ,A(Posts(A)) can be exponen- 
tial in the number of states of A(X). This exponential blow up provides from 
the fact that f(X) correspond to a “projection” for the function / : N m — > N m 
defined by f(x i, . . . ,x m ) = ( 0,x 2 , ■ ■ ■ ,x m ) ([Ler03b]). 

5 Asymptotic Size of Pref k (X') 

The polynomial time computation of Pres (A') is a first step to be able to ef- 
ficiently compute the set of predecessors in k steps. If each step multiplies the 
size of the SDA by 2, after k steps, the size of the SDA that represents the set 
of predecessors is greater than 2 fc . In this section, we give sufficient conditions 
such that this exponential blow up cannot appear. 

Definition 7. A counters system S is affine if for any a £ E, there exists an 
affine function f a : D a — > N m , D a C N m , such that xlZ a x' iff x' = f a {x). 

Precisely, we show that if D a and X' are definable in the interval logic (al- 
most all the counters systems studied in practice, satisfy this condition [Str98], 
[DelOO], [BB02], [FS01], [FL02]), the asymptotic size in k of A^Prel^A')) is 
polynomial in k. 

The size of A(X) is first bounded in the granularity of the set X defined as 
bellow. 

Definition 8. The granularity of an interval-definable set X is the least integer 
gran(A) > 0, such that X is the set of vectors accepted by a formula in the 
interval logic with c < gran(Ai): 

(j> := Vi = c\(j>V <f>\4> /\ 4>\~'(j>\true\ false 



Proposition 2. For any interval- definable set X, we have: 

\A(X)\ < (r. gran(X)) m + 2 3 ” 1 

Proof. Recall that the size of the SDA A(X) is equal to the number of elements 
in ( 7 “ 1 (X); a € E*}. We first prove that for any word a £ (Eff)* and for any 
interval-definable set X such that r^ a ^ m > gran(X), the granularity of 7 “ 1 (A) 
is bounded by 1. Next, we show that there exists at most 2 3 interval-definable 
sets whose the granularity is bounded by 1. Finally, from these two results, we 
prove the proposition. 
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So, let us first consider a G (U™)* and an interval-definable set X such 
that > gran(A) and let us prove that gran( 7 “ 1 (X)) < 1. Remark that if 

gran(X) = 0 then X = N m or X = 0. As in these two cases, we have |A(X)| = 1, 
we can assume that gran(A) > 1. From 7 “ 1 (X flF) = 7 “ 1 (X) fl 7 ~ 1 (Y), 
j~ 1 (N m \X) = N m \ 7 “ 1 (A), we can assume that there exists i G {1, . . . , m} such 
that I = {iG N m ; Xi = gran(X) — 1}. We have 7 “ 1 (A) = {x G N m ; ('y a (x))i = 
gran(X) — 1} = {a; G N m ; Xi = c} where c = < 1- Remark that 

if c ^ N then 7 “ 1 (X) = 0 and if c G N then c = 0. In these two cases, we have 
proved that gran( 7 “ 1 (A)) < 1 . 

Next, let us prove that there exists at most 2 3 interval-definable sets X 
such that gran(A) < 1. Remark that such a set is a finite union of sets defined 
by a formula of the form /\ ieI (xi = 0 ) Vi'e/A 3 -*' A 0 ) where 1,1' C {1, ... , m} 
and 7 fl 7' = 0. So, there exists at most 2 3 interval-definable sets whose the 
granularity is bounded by 1 . 

Finally, let X be an interval-definable set such that gran(X) > 1 and consider 
k > 0 such that r k > gran(A) > r k ~ 1 . The number of states of A(X) is bounded 
by TZ 0 _1 ri + 2 3 ™ < (r.gran(A)) m + 2 3 " 1 . □ 

Next, we characterize the affine function / such that the inverse image of an 
interval-definable set remains interval-definable. 

Definition 9. Let X be a subset ofN m , n> 0 and I C {1, . . . , m}, the set X/ iTl 
is defined by: 

Xi. n = {iGl; Vi G I, Xi = n; Vi # I, Xi < n} 

Proposition 3. For any interval- definable set X and for any n > gran(A), we 
have: 

X = U X I>n +Y^ N.Ci 

I C{1,... ,ra} iG/ 

Proof. Let us consider a formula <fi in the logic f> := 17 = c|</> V <p\(f> A 
(j)\-^(j)\true\f alse such that c < gran(X) and such that the set of vectors satisfy- 
ing </> is equal to X. By developing </>, we can assume that </> is a finite disjunction 
of formula of the form A c i) l\jej-( v j = c o ) where <7^ fl J = = 0 and 

Cj < gran(X). Remark that we can assume that (f = A/ej ( v :i A c i) A jej = ( v i = 
Cj) to prove the proposition. 

Let x € X and let us prove that x G U/c{i m} X i, n + bet us 

consider the set I = {i G {1, ... , m}; Xi > n}. As for any j G J=, we have 
Xj = Cj < gran(A), we deduce I C {1, . . . ,m}\J = . Let us consider the vector 
y G N m defined by 37 = n if i G I and yi = Xj otherwise. As x satisfies <j), the 
vector y also satisfies cj). Therefore y G X/ ira . From x G y + Xue/hJ.e.i, we deduce 
the inclusion X C IJ/c{i ™} bet us prove the converse 

inclusion. Let x G U/c{i m} -Xj.n + XuGjbf- e !- There exists I C {1, . . . , m} 
such that x G Xi^ n + JT e/ N.e,;. Let y G X^ n such that x G y + JT gJ N.ej. 
As y G Xj^ n C X, y satisfies cj). As for any i G 7, we have y t , = n, we have 
7 C {1,... ,?n}\J = . From x G y + J2iei ^- e *> we deduce that x satisfies (f> . 
Therefore x G X. □ 
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Proposition 4. Let f : D —>■ N m with D C N m be an affine function. The two 
following assertions are equivalent: 



— D is interval-definable. 

— For any interval-definable set X ' , / _ 1 (X') is interval-definable. 
Moreover, in this case, we have gran(/ _ 1 (X')) < gran(X') + gran(Z?). 



Proof. Remark that if f~ 1 (X') is interval-definable for any interval-definable 
set X' , then in particular, as N m is interval-definable, the definition domain 
D = / -1 (N m ) is also interval-definable. So let us consider an affine function 
f : D N m such that D C N m is interval-definable and let X’ be an interval- 
definable set. We first prove that we can assume that gran^') > 1. In fact, if 
gran(X / ) = 0, then X' = 0 or X' = N m . In the first case, we have f~ 1 (X') = 0 
and the set f~ 1 (X') is an interval-definable set such that gran^ -1 ^')) = 
0 < gran(X') + gran(Z?) and in the second case, we have f~ 1 (X l ) = D and 
the set f~ 1 (X l ) is interval-definable and verify gran(/ _ 1 (X')) = gran(U) < 
gran(Jf') + gran(D). So, we can assume that gran(X') > 1. 

As / is an affine function, there exists a square matrix M £ M m (Q) and a 
vector v £ Q m such that /( x) = M.x + v for any x £ D. Proposition 3 shows 
that the sets X' and D can be decomposed as follow where Di = D/, g ran(.D) and 



/ v' 

I yv /,gran(X / ) 



d= u D J+J2 N - e J 

JC{1,... ,m} jG J 



We have: 



X’= |J X'j + Y,®-^ 

/C{1,... ,m} iG/ 



= [J {x£D J + Y / ^-e j ; f(x)£X'} 

JC{1,... ,m} j£J 

u u d + {x G ^ ^ N.Gj 5 f (c?) + 1V[ .x G X }■ 

JC{1,... ,m} d£Dj j€J 

[J U d + {x £ N.e j; f{d) + M.x £x' + J2 N.e<} 

J ,m} d£ Dj ^ J ieI 

I C ,mj x' £ X'j 



Let us consider a subset J such that Dj is not empty. In this case let us consider 
d £ Dj and remark that for every j £ J, we have d + N.ej C D. Therefore 
f(d) + N .M.ej C N m . So, for every j £ J and for every i £ {1, . . . , m} we have 
M i:j > 0. 

Now, let us consider a subset I such that X\ is not empty and consider 
x' £ Xj. Remark that we have just to prove that the following set is interval- 
definable and has a granularity bounded by gran(X / ): 
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{a: G Y^ N.ej; f(d) + M.x G x' + Yj N.ej} 

jeJ iei 

P|{x G f{d)i + Y Mij.Xj = x'J 

_ i#I j&J j&J 

n n e n N - e J ; /( d )» + n - x 3 ± ^ 

*e/c 4 e{o,...,*'-i} ieJ tea 

Remark that for every i ^ I, we have x , i — f(d)i < granjW') and for any i G / and 
for any c,; G {0, . . . , a:' — 1}, we have c* — f{d)i <x\ = gran(X'). Let us consider 
the sequence (aj)j £ .n... iTO \ in N defined by aj = Mjj if j G J and ay = 0 
otherwise. We have just to prove that for every k < gran(X') the following set 
is interval-definable and has a granularity bounded by gran(X'): 



{a: G Y N - e i5 H a 3- x i = k}= ( Y N ' e 3 ] n {a: G N m ; Y a T x o = k } 
tea jeJ yea J j = i 

The granularity of the set JT gJ N.ej N bounded by 1 and as gran(X') > 1, 
we have just to prove that for any sequence (<y)je{i,... , m } hr N and for any 
k < gran(X'), the following set has a granularity bounded by gran(X'): 

m 

{a; G N m ; ay. ay = fc} 

3=3 

Let us consider the set J' = {j G {1, . . . , rn}: aj > 1} and let Y = {y G 
N m ; Vj fL Jyj = 0; ^2jLi a 3-yj = k}- Remark that for any y G Y and for 
any i G {1, . . . , to}, we have y 3 < fc. Therefore Y is finite. Moreover, from 
{x G N m ; i a j- x j — k} = Y + Y2jgj' N.ej, we are done. □ 

We can now bound the asymptotic size of af(Pre| fe (X')) in function of k. 

Theorem 3. Let S be an affine counters system with interval-definable defini- 
tion domains and let X' be an interval-definable set. The asymptotic size in k 
of Pre| fe (X') is in 0{k m ). 

Proof. Let cs = max a£ £’gran(.D a ). From proposition 4, we deduce that 
for an interval-definable set X', the set Pres(X') is interval-definable and 
gran(Pres(X')) < gran(X') + cs- Therefore gran(Pre| fc (X')) < c' + k.cs 
where d = gran(X'). From the proposition 2, we deduce |.A(Pre;| (X'))| < 
(r.(d + cs-k)) m + 2 3 "\ □ 



Corollary 2. Let S be an affine counters system with interval-definable defini- 
tion domains and X' be an interval-definable set. We can compute in polynomial 
time in k the minimal SDA al(Pre| (X')). 
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Remark 8. When the sets D a and X' are upward closed (an upward closed set 
X is a subset of N m such that X + N m = X), the sequence Pre| fc (X') converges 
as any increasing sequence of upward closed sets. 



Remark 9. The bound 0{k m ) follows directly from proposition 2. In the proof 
of these proposition, we have assumed that no sharing appears in the SDA 
representing an interval-definable set. However, in practice, SDA are like BDD 
and the asymptotic size of A(Pre| (A - ')) is in 0(m. ln(fc)) rather than 0(k m ). 

When the sets D a and X' are just Presburger-definable, the following propo- 
sition 5 shows that the asymptotic size in k of A(Pre| (A')) may be exponential. 

Proposition 5. Let S = (N 2 , {a}, (/ 0 )) where f a {x\, x 2 ) = (r-Xi,x 2 ) over D a = 
N 2 , and let X' — {{x'^x'^) £ N 2 ; x\ = x' 2 }. For any integer k> 0, we have: 

K(Pre| fe (^ , ))l>r- fe - 1 

Proof. Let A,; be the subset of N m defined for any i > 0 by Xi = {( x , r*.x); x £ 
N}. We have Pre| fc (A') = U*=o A,. Assume by contradiction that there exists a 
SDA A = ( Q , ZJ r , S, {go}, F) that represents Pre| fe (A') and such that card(Q) < 
r fc_1 . Let us consider the finite language £ = (00 + • • • + (r — l)0) fc_1 10. For 
any word er £ £, we have S(qo,cr) £ Q. As card(Q) < r k ~ 1 = card(£), there 
exist two words a ^ a' in £ such that 5(q 0 ,cr) = 5(q 0 ,<7'). Let y,y' £ N such 
that p 2 (cr) = (y,0) and p 2 (cr') = (y\ 0). We have y,y' £ {r fe_1 ,... ,r k - 1}. 
Let us consider a word w £ Xf such that p 2 (w) = (0,y). From p 2 {u.w) = 
p 2 (cr) +r k .p 2 {w) = ( y,r k .y ), we deduce that p 2 (a.w ) £ Xk- As A is a SDA that 
represents Ui=o^o we h ave proved that a.w £ £(A). From S(q 0 ,a) = S(qo,er'), 
we deduce that a'.w £ £(A). Therefore (y',r k .y) = p 2 (a'.w) £ (J There 
exists i £ {0, ... ,k} such that (y',r k .y) £ X,. We have r k .y = r l .y'. From 
y > r k and y 1 < r k , we deduce i > k — 1. Hence i = k and we have proved 
that y = y'. As a and o’ have the same length and as p 2 {cr) = p 2 (<j'), we have 
<t = a'. We have a contradiction. □ 
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Abstract. Abstractions often introduce infinite traces which have no 
corresponding traces at the concrete level and may lead to failure of the 
verification. Refinement does not always help to eliminate those traces. 
In this paper, we consider a timer abstraction that introduces a cyclic 
behaviour on abstract timers and we show how one can exclude cycles by 
imposing a strong fairness constraint on the abstract model. By employ- 
ing the fact that the loop on the abstract timer is a self-loop, we render 
the strong fairness constraint into a weak fairness constraint and embed 
it into the verification algorithm. We implemented the algorithm in the 
DTSpin model checker and showed its efficiency on case studies. The 
same approach can be used for other data abstractions that introduce 
self-loops. 



1 Introduction 

Abstraction techniques are widely used to make the verification of complex/para- 
meterised/infinite systems feasible. Abstraction, intuitively, means replacing one 
semantical model by an abstract, in general, simpler one. The abstraction needs 
to be safe , which means that every property checked to be true on the abstract 
model, holds for the concrete one as well. This allows the transfer of positive 
verification results from the abstract model to the concrete one. 

The concept of safe abstraction is well-developed within the Abstract Inter- 
pretation framework [8,9,12]. The relation between the concrete model and its 
safe abstraction is formalized there as a requirement on the relation between the 
data operations of the concrete system and their abstract counterparts. Every 
value of the concrete state space is mapped by the abstraction function a into an 
abstract value that “describes” the concrete value. As an example consider the 
abstraction of integers into their signs in which e.g. —3 is mapped by a into neg. 
For every operation (function) / on the concrete level, an abstraction /“ needs 
to be defined which “mimics” /. In general, the abstraction can be nondetermin- 
istic. For example, addition (+) over the integers is abstracted into an operation 
(+ a ) such that pos +“ neg may yield pos or neg nondeterministically. This is 
formally captured by letting /“ be a function into the powerset over the domain 
of abstract values. 



S. Graf and L. Mounier (Eds.): SPIN 2004, LNCS 2989, pp. 198—215, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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Fig. 1 . Abstracted timer 



Working within the Abstract Interpretation framework guarantees the preser- 
vation (in the direction from the abstract to the concrete model) of the truth 
of formulas of temporal logics without existential quantification over paths, e.g. 
□L+ (i.e. , all formulas of the /i-calculus without negation and containing only 
the □ operator) or next-free LTL [20,11]. Counterexamples can be spurious. In 
case a counterexample is found, the abstraction should be refined and the re- 
fined model is then model-checked. Such a sequence of refinements can happen 
to be infinite; in this case one needs different techniques to prove or disprove the 
property. 

In this paper we consider a simple abstraction for (discrete) timers similar to 
the one from [10]. This abstraction is often used to prove that a property holds 
for all instantiations of settings of a timer that are greater than or equal to some 
value k. It leaves all values below k unchanged and maps all other values to the 
abstract value k + . Being a deterministic operation on the concrete model, the 
time progress operation tick becomes non-deterministic on the abstract one (see 
Fig. 1). That introduces infinite traces with k + —i k + being chosen whenever 
tick is enabled. As a result, the timer never expires, which, in general, does not 
correspond to any trace of the concrete model. For instance, properties of the 
form □((/>—> O if)) get disproved on the abstract model whenever they depend on 
the fact that the timer in question eventually expires after being set. Refining 
the model by taking a greater value for fc, we still keep the loop at fc,+ euJ . So, 
refinement gives no solution to this problem. 

The systems we consider are specified as parallel compositions of commu- 
nicating processes. A process consists of a number of locations, variables and 
a number of transitions connecting the locations and changing the valuations 
of variables. Processes can communicate by rendezvous/buffered message pass- 
ing and through shared memory. There are explicit timing constraints in the 
specification imposed by timer operations. 

We assume that the properties are given in the universal fragment of the 
/x-calculus, DL^, consisting of formulas in which the negations are applied only 
to atomic propositions. The verification methodology we propose works for any 
formula of the universal fragment without negation DL+ and, under certain 
conditions that occur relatively often in practice (for instance, if the formula 
does not refer to abstracted variables (timers)), for the whole OL^. 1 



1 Since any formula can be rewritten into an equivalent OL+ formula, and any 
formula of the universal unrestricted /i-calculus has an equivalent formula in DL M 
(see e.g. [20]), this is not a significant loss of generality. 
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To exclude the infinite loop k + t ^X k + that causes spurious counterexamples, 
we impose a strong fairness condition <?“ on the abstract model, which we call 
t-fairness : “For any trace where k + ( k — 1) is infinitely often enabled, 

k + ^X ( k — 1) is infinitely often taken or t is infinitely often set to a new 
value” . We show that the concrete property <P that corresponds to the t-fairness 
condition <P a trivially holds on the concrete model. Therefore, in order to prove a 
formula </> on the concrete system we check the validity of the formula <!> a —> </>“ 
on the abstract one, where (f) a is the corresponding abstract version of </>. If 
(£> a — » (f>° holds, we conclude that <f> holds on the concrete system. It should be 
emphasized though that we use fairness only to eliminate unwanted traces in the 
abstract system. We do not lift fairness constraints from the concrete system to 
the abstract system. 

By exploiting some specifics of the class of systems we are working with, we 
show that the strong fairness criterion can be reformulated into a weak fairness 
one. When one deals with explicit model checking, this is often a significant 
advantage because algorithmically, it could be easier to deal with the latter. 
Moreover, when one stays in the realm of explicit-state model checking, it is much 
more efficient to build the t-fairness check into the model checking algorithm, 
instead of expressing it as a formula. In this case, one can check for the validity 
of </> on the abstract model, assuming a built-in t-fairness check. The t-fairness 
check algorithm we propose here is inspired by Clroueka’s flag algorithm [5] , and 
it is a version of the algorithm for weak process fairness which is implemented 
in Spin. 

We implemented our algorithm in DTSpin [3] (a discrete-time version of 
the Spin model checker [15]) and tested the prototype implementation on some 
examples from the literature with encouraging results. 

Related work. Counter abstractions similar to the timer abstraction we use 
are quite standard and they can be traced to [21]. Such abstractions are often 
applied to abstract (discrete) timers for the verification of safety properties (see 
e.g. [10]). We study here the verification of liveness properties, which gives rise 
to the use of fairness requirements on the abstract model. 

There are several papers that deal with the problem of eliminating spurious 
execution sequences caused by abstraction. The closest to our approach is the 
theory of linear abstraction from [17] (also described in [18]). The general method 
of data abstraction presented there can also suffer from the problem of spurious 
execution sequences. To eliminate those, it is suggested to augment the system 
under consideration by an auxiliary monitoring module (executed synchronously 
with the system) and then to abstract the system obtained by such a compo- 
sition. In one of the examples, [17] features a three-valued counter abstraction 
({0, 1, 2 + }, using our notation). Thus, one could apply the idea of a monitoring 
process to eliminate extra sequences introduced by self-loops to abstract states. 
However, this would lead to a solution based on strong fairness on the transition 
level. The monitor labels the “critical” transitions with —1 or +1. The (strong) 
fairness criterion requires that if a —1 transition is executed infinitely often then 
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also a +1 transition is executed infinitely often. This ensures leaving the arti- 
ficial self-loops in the abstract state space introduced by the abstraction. As it 
was already emphasized, we show that in the context of timer abstraction, such 
a straightforward strong fairness can be transformed into a weak one, which is 
a significant advantage in the context of explicit model checking. 

In [22] the authors present a three- valued counter abstraction in the context 
of the verification of parameterized systems, i.e., networks of N identical con- 
current processes, where N is an arbitrary finite number. The counters count 
the number of processes at a particular control (program) location. The solution 
to the problem of spurious execution sequences also in this case boils down to 
strong fairness. To this end two new variables from and to are introduced. The 
unwanted self-looping sequences are eliminated by the natural requirement that 
for each process location l if the processes enter l infinitely many times, then 
they must also leave it infinitely many times. 

The problem of parameterized networks of processes is also treated in [1] , with 
a solution for the spurious sequences which resembles both of the above given 
approaches. The role of the monitors from [17] is played by “ranking functions”, 
similar to the ones used to ensure the termination of sequential programs. The 
ranking functions count how many processes have executed a particular transi- 
tion in the concrete system. By abstracting a ranking function value, similarly 
to [22] , one obtains a separation of the “critical” transitions into “negative” and 
“positive” ones. The “marking algorithm” which solves the problem of spuri- 
ous sequences is based on strong fairness. The efficiency remarks in favor of our 
solution in the context of explicit model checking would also apply to [1] and [22] . 

a-SPlN [13] is an extension of Spin with abstraction. The abstraction frame- 
work of o-Spin is based on the Abstract Interpretation theory and in that regard 
it is similar to our approach. However, to the best of our knowledge, there is no 
work that deals with spurious executions in the context of a-SPlN. Another 
approach to use abstractions in combination with Spin can be found in [15]. 

The paper is organised as follows: In Section 2 we describe the timer abstrac- 
tion and introduce the notion of f-fairness. In Section 3 we present the verification 
algorithm. In Section 4 we describe our implementation of f-fairness in DTSpin. 
In Section 5 we discuss some experimental results. Finally in Section 6 we give 
some conclusions. 

2 Timer Abstraction and Fairness 

Currently, model checkers provide some facilities to (automatically) reduce a 
state space, like partial-order reduction techniques. These techniques deal mainly 
with the control flow of a model. On the contrary, data (values stored and trans- 
mitted in a system), whose domain is often infinite or very large, are not handled 
by them; it is a task of a user to present data in a verification model in a finite 
form of reasonable size. Depending on the property to be verified, the actual 
values of data may sometimes be ignored or replaced by some abstract values. 
In an abstract model, the operations on data are mimicked by new ones on the 
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abstract data. The main requirement for an abstraction is that the abstract sys- 
tem behaviour should correctly reflect the behaviour of the original system with 
respect to a verification task in the sense that (1) an abstraction should capture 
all essential points in the system behaviour, i.e. , be not “too abstract”, and (2) 
an abstraction should be safe. 



Abstraction and Abstract Interpretation Framework. We first give a 
brief overview of the formal aspects of abstraction (see [1,20] for more detail). 

Let the semantics of a concrete system M be given with the corresponding 
transition system T = ( S,R ), where S' is a set of states and R C S x S is a 
transition relation. Given an abstract state space a S and a total description 
relation p s C S x a S, we derive the pair of (monotonic) functions a and 7, 
where a : 2 s — > 2 aS is the point- wise lifting of p s to the sets of states, i.e. 
a = post[p s \ , and 7 : 2 aS — > 2 s is the inverse image of p s , i.e. 7 = pre[p s ] ( post 
and pre are the standard relation post- and preimage.) Intuitively, p s induces a 
simulation relation between T and a T (c.f. [20]). We say that a T = ( a S, a R. ) 
is an abstraction of T with regard to a, denoted as T O a a T iff VQ C a S : 
post[R] ('j'(Q)) C 7 (post[ a R](Q)). As a consequence, a given formula $ holds on 
T if it holds on a T, under condition that 7 (X a ((f>)) C I((f>) for corresponding 
interpretations I,T a . The preservation result holds for formulas of temporal 
logics without existential quantification over paths, e.g. IHL+ or LTL [20,11]. 

Given an abstraction function a, let </>, (f> a be /x-calculus formulas with the 
corresponding sets of atomic propositions V,V a . The semantics of </> and </>“ 
is given with the interpretation functions I : V — > 2 s and I a : V a —> 2 aS , 
respectively. Let p a be the proposition that corresponds to the subset I a (p) = 
a(l(p)) — a(X(p)) of the abstract state space a S. We say then that p a is a 
contracting abstraction of p under a [17]. (Note that p a can be considered as 
interpretation of p under I a : V —> 2 aS .) We call (f> a a contracting abstraction 
of formula 4> if is obtained by replacing each atomic proposition p in <j> with 
its contracting abstraction p a . 

Abstraction function a is consistent 2 with I iff for each p e V : a(X(p)) (~l 
a(I(p)) = 0, i.e. the images by a of the interpretations of p and ->p are not 
contradictory [20]. (Consistency of 7 with I a is defined analogously.) In this 
case, we call the contracting abstraction (j) a a consistent abstraction of <)>. Note 
that for all s e S,s a e a S such that s“ e a({s}), s f= p iff s a \= p a precisely 
when p a is consistent, and s (= p if s a |= p a when p a is contracting. 

Theorem 1. Let T = (S,R) and a T = ( a S, a R ) be two transition systems 
such that T a T, with interpretation functions I,I a defined as above. Given 
a E1L+ (resp. formula </>, let 4> a be a contracting (resp. consistent with I) 

abstraction of </>. Then a T (= (f> a implies T (= tf>. 

Proof. The theorem is a corollary of Theorem 2, item 1 B, from [20] : Observe that 
7(1 a (p)) C I(p) and that the consistency of a with I implies the consistency of 
7 with I a . □ 

2 Consistent abstraction corresponds to the notion of precise abstraction from [17]. 
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Often it is more convenient to apply abstractions directly on the model M 
than on its transition system T. Such an abstraction on the level of M is well- 
developed within the Abstract Interpretation framework [8,9,12]. The require- 
ment that Abstract Interpretation imposes on the relation between the concrete 
model M and its safe abstraction M a can be formalized as a requirement on 
the relation between the data and the operations of the concrete system and 
their abstract counterparts as follows: Each value of the concrete domain £ is 
mapped by a description function pd into a value from the abstract domain a £. 
The abstract value “describes” the concrete value. We assume an ordering A on 
the abstract domain a £ according to the “precision” of abstract values: given 
a concrete value x and its abstract description x a = pd{x ), we say that any 
y a e a £ such that x a A y a is a less precise description of x. 

For every operation (function) / on the concrete data domain, an abstract 
function /“ is defined, which “mimics” /. (For simplicity, we assume / to be a 
unary operation.) In general, the abstraction can be nondeterministic. This is 
formally captured by letting /“ be a function into the powerset over the domain 
of abstract values. The requirement of mimicking is then formally phrased with 
the following safety statement : \/x e £ 3y e f a (pd{x)) : pd(f(x)) A y. 

A state s can be seen as a valuation vector (vq,vi,... ,v n -\) and, thus, 
S = £ o x . . . x £ n -i, with Ao, . . . , £ n -i being the corresponding data domains. 
Assuming the same set of variables as in M, the state space of M a is a S = 
a £ 0 x ... x a £„_ 1 . We relate S and a S via the description relation, which in our 
case is the function p s : S — > a S defined as p s (s) = {pdo( v o), ■ ■ ■ ■> Pd n -i( v n-i )), 
where pdg, ■ ■ ■ ,Pd n - 1 are description functions for the corresponding variables. 
(We assume a trivial (identity) mapping as description function for unabstracted 
variables.) 

Let M a be obtained by replacing each constant c and function / of M with 
their abstract versions, T a = ( S a , R a ) be the transition system that corresponds 
to M“, and let the safety statement hold for all functions. Obviously S a = a £ 0 x 
. . . x a £ n -i = aS. Moreover, for “usual” modelling languages, like Promela, 
R a 2 aR, which can be shown e.g. following 4.4.1 from [9]. This trivially implies 
a.T Ea' T a , where a' is the identity function. Thus, by Theorem 1 a given 
formula f> a is preserved from T a via a T to T, i.e., from M a to M. 



Timer Abstraction. We employ the concept of timers to specify timing con- 
ditions imposed on the system. Each timer is related to a certain process and 
modelled by a timer variable. We denote the value of a timer t at a state s as [t] s . 
A timer can be activated by setting. Timer variables are mapped to integers; — 1 
represents a deactivated timer and larger values stand for active timers. A setting 
step set(t , e) leads to the change of the timer value to the value given by ex- 
pression e. A predicate expire(t) is true iff [t] =0. The transitions are assumed 
to be instantaneous. Time progression is modelled as a special transition called 
tick that decreases values of all active timers by 1 and leaves deactivated timers 
unmodified. Further, we refer to a segment of time separated by time progress 
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steps as a time slice. We leave the semantics of time partially open here, since 
our approach does not depend on it. (We revisit this issue in Section 3.) 

To prove that some property holds for all settings of a timer that are greater 
or equal to some value k, one often uses a timer abstraction similar to the one 
of [10]. For a timer t, the concrete domain of timer values S = N U {—1} is 
replaced with the abstract domain a S t = {—1,0,... ,kt. — 1 ,kf}, where the 
value kt is a positive value defined by the user assuming that the verification 
property still holds even if we do not distinguish between the values of the timer 
greater or equal to k t . We overload the notation by using c (—1 < c < kt) as 
an abstract value representing the single concrete value i, while c + describes the 
set of concrete values {c, c + 1 , c + 2, . . . }. 

The description function pd t is defined as Pdt( c ) = c, if c < kt, and Pdti c ) = 
k+ , otherwise. Abstract operations on timers are defined in an intuitive way: 
setting a timer to value x becomes setting it to value pd t (x), expire 0 {a) is true 
iff a = 0, and tick 0 is a non-deterministic operation that changes the value of a 
timer from a to & according to the following rules: (1) if a = —1 then b = — 1, 
(2) if 0 < a < kt then b = a — 1 (where ” works on abstract values as on 
integers), (3) if a = x + then b e {x + , x — 1}. 

Lemma 2. System M° built from system M according to the rules given above 
is a safe abstraction of M. 

Proof. By a simple check that the safety statement is satisfied. □ 

From now on we assume that systems under consideration have no deadlocks 
and infinite zero-time cycles (infinite traces with a finite number of tick's). The 
absence of zero-time cycles can be checked on the abstract model by verifying the 
property OOtick°, which is a consistent abstraction of DO tick. The absence of 
deadlocks follows straightforward from the fact that time can progress even when 
no other action is possible in the system, and thus tick action is still possible. 



Fair timer abstraction. An abstracted system contains more behavior than 
the original one. Therefore, positive verification results can be transferred from 
the abstract to the concrete system, while counterexamples can be spurious. 
Abstraction refinement is a common technique used in case spurious counterex- 
amples are found (see e.g. [6]), though just a change of the granularity level does 
not always help — the sequence of refinements can turn out to be infinite. 

Suppose we use the timer abstraction described above to prove that some 
property holds for all timer settings greater than or equal to some kt- Due to 
the non-determinism introduced with the abstract version of tick , it becomes 
possible that the timer once set will never expire. That means that the states 
that are always reachable in the concrete system are not reached in the abstract 
system if k+ —) kf step is always chosen. Such a trace gives a spurious coun- 
terexample: In the concrete system the timer expires after a finite number of 
time slices. The only possible refinement is taking the same abstraction with 
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a greater value of k. But the same trace where the timer never expires is still 
possible, so a counterexample would be produced again. Therefore, we need a 
different technique to cope with the problem. 

Imposing a strong fairness condition that requires that for any trace where 
transition kf ) (fc t — 1) is infinitely often enabled it is infinitely often taken, 
gives incorrect results: One can easily build a (concrete) model where a timer t 
is infinitely often set to a new value (before it expires), so it can be seen every 
time as a new variable in the one-assignment setting. This observation leads us 
to the following definition of t- fairness: 

Definition 3. Given an LTS T of a system with a set of abstract timers TVar a , 
we say that a trace of T is f-fair iff for any t e TVar a the following holds: 
kf 44) (kt~ 1) is infinitely often enabled implies that kf 44) (fcj — 1) is infinitely 
often executed or set(t,x), x e a E t , is infinitely often executed. 

This definition has a strong fairness pattern. Interestingly, due to the fact 
that the loop introduced on a timer with the abstraction is a self-loop, this 
requirement can be reformulated as a condition with a weak fairness pattern: 

Lemma 4. A trace f of T is t- fair iff for any t e TVar a the following holds: 
if there exists an infinite suffix a of £ such that [f] s . = k + for every state of a, 
then set(t,kf) is infinitely often executed along the trace. 

Proof. Let p, q, and r denote the propositions (from Def. 3) “k+ — ) (k t — 1) is 

enabled”, -^4 ( k t — 1) is executed”, and a set(t,x), x e is executed”, 
respectively. 



□ Op -A (UOq V OOr). (1) 

We can split the proposition r into a disjunction of two propositions r\ and r-^. 
u set{t,kf) is executed” and u set.(t,x), where x 4 kf , is executed”, respectively. 
After straightforward transformations, (1) becomes 

-.(□Op A OD(-i <7 A -iri)) V DOr 2 . (2) 

We will show that QOpA OD(->g A ->ri) (*), is semantically equivalent to ODp', 
where p' denotes the proposition “the value of t is kf". 

The conjunct DOp says that kf 44) {k t — 1) is infinitely often enabled. Since 
we assume the absence of zero-time cycles, by the timer abstraction definition, 
this is equivalent to the proposition “timer t has value kf infinitely often” . The 
conjunct O □(-.<? A — irq) says that after some point in the execution sequence 
neither kf —$■ ( kt — 1) nor set(t,x), with x 4 k+ , are executed. As these 
transitions are the only ones that can change the value of t from kff to a value 
different than kf , we can conclude that from some point the value of t will 
remain kf forever. 

For the other direction, we first observe that if t. has value k+ from some point 
on, then k 44) (fc t — l) is enabled infinitely many times. (Again, we use the 
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absence of zero-time cycles, i.e., a tick transition is executed infinitely often along 
any execution sequence.) Also, the other conjunct of (*) follows immediately: As 

kf ( kt — 1) and set(t,x), where x ^ kf , are the only statements which can 
change the value of the abstract timer t, they also cannot be executed after some 
point on. 

Thus, we can replace (*) with the equivalent proposition ODp' and rewrite 
(2) as <>Op' — »• DOr 2 , which is the (weak f-fairness) condition of Lemma 4. □ 

Thus we can express the f-fairness criterion by the LTL formula <P a = 
A teTVar a (.^ a P ^Og), where p and q are propositions corresponding to the 
terms “|t] s . = k+” and “set(t, k+)” from Lemma 4, respectively. Though <P a 
is formulated on states and transitions , it can be easily encoded as a property 
defined on the states of the system. (To express the fact that some transition q is 
infinitely often taken, one can e.g. extend the model with introducing a boolean 
variable b q that is negated every time the transition is taken and replace DOg 
with UOb q A mO-ifeq.) 

One can see the analogy between <P a and the definition of weak fairness for 
processes, where a timer set to kf corresponds to an enabled process and an 
execution of the set operation corresponds to an execution of an action by the 
process. Further, one can show that the t-fairness criterion d> a is a consistent 
abstraction of the LTL formula <P = AteTVar(^ a P' DOg'), where p’ ,qf are 
defined as “|t] s > kt” and ”set(t,x), where x > k” , respectively. This can be 
done by a simple check that p and q are consistent abstractions of p' and q ' , 
respectively. Indeed, let s a e a({s}). Timer t has the value kff in the abstract 
state s a iff t has a value greater than or equal to kt in s. Similarly, t is set to 
some x which is greater than or equal to k t by a transition which has s as the 
target state iff it is set by a transition in the abstract state which ends up in the 
state s a with [f] s a = kf . 

Suppose we want to verify that T |= 4> for some C1L+ (resp. DL^) formula 
<f> and a concrete system T without infinite zero-time traces. The “concrete” 
version of the abstract t-fairness condition, holds on any trace of T: If from 
some point on the value of timer t remains greater than or equal to k t , then the 
timer must be infinitely often set to some value greater than kt.- Otherwise, since 
tick happens infinitely often, the value of t will eventually become less than kt.. 
Thus, T \= (j> iff T |= (<2> <f>). 

By Theorem 1 we know that instead of verifying T |= — > </>) on the concrete 

system, we can verify its contracting (resp. consistent) abstraction (<P — > (j>) a on 
the abstract system. By the definition of contracting (consistent) abstraction, 
the last formula is equivalent to d> a — > </>“. In case <j> does not refer to variables 
(timers) that are abstracted, the abstraction a is trivially a consistent abstrac- 
tion for all atomic propositions in <f> and we have 4> a = (f>. If <f> does mention 
abstracted timers, one has to derive the contracting abstraction (f a of </>. Finally, 
by Theorem 1, T a |= (<& a —> <f a ) implies T |= (4> (j>) and thus also T (= </>. 

Thus, by imposing f-fairness condition on the abstract model, we eliminate 
spurious counterexamples caused by unfair non-deterministic choices made by 
abstract functions. 
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3 Incorporating t-Fairness into the Verification Algorithm 

To express the formula <& a as an LTL formula defined on the states of the sys- 
tem, one needs to introduce additional variables (see Section 2). Therefore it is 
computationally expensive to verify the formula <k> a — > (f> a and it is more con- 
venient to incorporate the f-fairness requirement into the verification algorithm 
that verifies </>“ by considering f-fair traces only. In this section we describe how 
to embed the t-fairness check into a model-checking algorithm for LTL. 

Since there is a strong analogy between t-fairness and weak process fairness, 
one can easily adapt any algorithm for model checking under weak process fair- 
ness. The algorithm we propose here is inspired by the weak process fairness 
algorithm used in Spin [15,2], which is a combination of the Nested Depth First 
Search (NDFS) algorithm [7] and Clroueka’s flag algorithm [5]. In the automata- 
tlreoretic approach, to verify a property expressed by an LTL formula, the nega- 
tion of the formula is translated into a Buchi automaton, which is combined 
with the transition system representing the state space of the system. If the lan- 
guage accepted by the resulting automaton is empty, the property is satisfied. 
As a result, the model checking problem is reduced to a graph theoretic prob- 
lem of finding acceptance cycles, i.e. , cycles that contain states from a special 
designated set of accepting states. The absence of acceptance cycles means that 
the property holds for the system. Further on we assume that we work directly 
with the labelled transition system (LTS), which is the product of the Buchi 
automaton and the LTS of the system. 

Given an LTS T = {S, Act, — >t, Si n it, F) of a composition of the transition 
system of a given abstract system with the Buchi automaton that represents the 
negation of a property to be verified, where S' is a finite state space, Act is a set 
of actions, — S x Act x S is a transition relation, Si n u e S is an initial state 
and F C S is a set of accepting states. Our goal is to construct an extension 
of T that contains an acceptance cycle iff there exists a f-fair acceptance cycle 
in T. (We say that a cycle so ...s n so is t-fair iff Vf e TVar a there 
exists i, (0 < i < n), such that [t] s . ^ k f or a.j = set{t, kf).) Therefore, we will 
define this extension in such a way that any acceptance cycle would be t-fair by 
construction. 

Let the abstract system have N abstract timers. Then we construct the 
extended LTS T' = {S', Act', — >t’ , s' init , F') in the following way: The set of 
states of the extended system is a set of pairs (s, c), where s e S and 0 < c< N. 
We call (s, c) a c-replica of s. (Note that not every replica (s, c) of a reachable 
state s of T will be reachable in T"). 0-replicas are the basic replicas of the states, 
while replicas 1, . . . ,N allow to track the behaviour of abstract timers fi, . . . , t n, 
respectively. All the accepting states and the initial state of T' are 0-replicas of 
the accepting states and the initial state of T, respectively. All transitions from 
accepting states lead to 1-replicas only. Transitions from a c-replica (s, c), related 
to timer t c , lead either to the c-replicas, or, when they guarantee t-fair behaviour 
w.r.t. timer t c , to the next ((c+1) mod (N + 1)) replica. Since all the acceptance 
states are 0-replicas, any acceptance cycle contains for every abstract timer at 
least one transition that guarantees t-fairness. 
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The verification algorithm starts the construction of T' from the initial state 
( Si n it , 0) and proceeds by adding the 0-replicas in accordance with the transition 
function — until an accepting state is met. If an accepting state s is encoun- 
tered, the algorithm adds a dummy r-step that connects the 0-replica of s with 
the 1-replica of the same state. A move from a c-replica with 1 < c < iV to the 
((c + 1) mod ( N + l))-replica happens when a state is encountered in which t c 
has a value different from or a step setting timer t c is taken, i.e. when the 
t-fairness condition for t c is fulfilled. (A move from a 0-replica to a 1-replica is 
possible only by T-steps connecting the replicas of the same accepting state.) For 
the rest, the algorithm adds states following the transition function — >t- 

Theorem 5. Given an LTS T = ( S,Act , — >t, Sinit, F) with abstract timers 
ti, . . . , tjv and its smallest extension T' = {S', Act ' , — >t', s ' init , F') that- satisfies 
the following conditions: 

1 . Act' = Act U {t}; 

s init 

3. (s, 0) (si, 0) if {s, 0) e S' and s si and s $ F; 

4- ( s , 0) -4-t' (s, 1) if (s, 0) e S' and s e F; 

5. {s, c) (si,Ci) if (s,c) e S' and c > 0 and s t si with c\ =((c+ 1) 

mod {N + 1)) if (|t c ] s kf or a = set{t c , kf )), and C\ = c otherwise; 

6. F' = S"n{(s,0) | s e F}. 

Then the following statements hold: 

1. (S,Act, — ^t, Sinit) and {S', Act', — ^T' , s' init ) are branching bisimilar. 

2. T contains a reachable t - fair acceptance cycle iff T' contains a reachable 
acceptance cycle. 

Proof. 1. Consider Q C S x S' where (s, s') e Q iff s' = (s, c) where 0 < c < N. 
It is straightforward to check by case analysis that Q is a weak bisimulation. 
Since system T is r- free, T and T' are branching bisimilar [24]. 

2. Notice that all acceptance cycles of the extended state space are t-fair by 
construction: An acceptance cycle contains at least one accepting state; this 
state is a 0-replica and has outgoing transitions to 1-replicas only. As transitions 
from a c-replica lead either to c-replicas, or to “neighbour” ((c+1) mod (Ar+1))- 
replicas (0 < c < N), for any c, the cycle includes a c-replica (s, c),s e S. Every 
move from a c-replica to its neighbour satisfies the t-fairness condition for timer 
t c , so for every abstract timer there is a transition in the cycle satisfying the 
t-fairness condition and thus the cycle is t-fair. 

Due to the bisimulation result, any acceptance cycle of T' (which is always 
t-fair) has a corresponding t-fair acceptance cycle in T. 

In the opposite direction: Assume that there is a trace s ln i t ... in 

T that contains a fair acceptance cycle. Then there are Si,Sj such that s. t = Sj 
with j > i. The path tt from .s, to Sj contains at most m = {j — i) distinct states. 
Trace cr = Si n u -A ...Si....Sj...s, going through the cycle N + 1 times is also the 
valid trace of T . Due to the bisimulation result, there is a trace a' in T' that 
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Procedure 6 (dfs(s, c)). 

add (s, c) to S' 

if c = 0 and s e F 

then if (s, 1) f S' then dfs(s, 1); 

else 

for all s —Tt si do 

if c > 0 and (a = set(t c , kf c ) or [t c ] s 4 kf ) 
then ci = (c + 1) mod N 
else ci = c; 

if (si,Ci) 4 Si' then d/s(si,Ci); 
od; 



add a pair to the state space 
0-replica and state s is accepting 
r-step from 0-replica to 1-replica 

for all transitions enabled in s 
f-fairness condition 
the next replica number 
the same replica number 
recursive call 



Fig. 2. Generating f-fair extension of S 



mimics a. The suffix of a 1 that mimics passing through the cycle N + 1 times 
contains at least m(N + 1) + 1 states. The states of £' are replicas of the states 
of 7 r, therefore at most m(N + 1) of them are distinct. Thus, there is at least one 
state that is present in £' twice, and £' is a cycle. 

Now we shall show that f is an acceptance cycle. We denote the suffix of a 
corresponding to f as £ and pick up an arbitrary state s of f. Then f contains 
some state (s, c), 0 < c < N. Assume that c > 0. Since £ is a f-fair cycle, there 
are some states qi,q 2 reachable from s such that q\ <?2 and (|t c ] gi ^ 
or a = set{t c ,kf)). Hence there exists a transition from the c-replica q\ to the 
((c + 1) mod N )- replica q 2 in Proceeding in the same way, we will obtain 
transitions leading to some ((c + 2) mod 7V)-replica, etc., and eventually we 
arrive at a 0-replica. Thus, we conclude that £' contains at least one 0-replica 
of some state. In T', transitions from 0-replicas of non-accepting states lead to 
0-replicas, and transitions from 0-replicas of accepting states lead to 1-replicas. 
Since £ contains an accepting state, due to the bisimulation result, £' contains 
an accepting state as well and thus it is an accepting cycle of V . □ 

We call the extension T’ a t-fair extension of T. An algorithm that generates 
the extended state space in a depth first search (DFS) manner is given in Fig. 2. 
It is straightforward to prove the following claim: 

Lemma 7. Given an LTS T, let T' be a system produced from system T by 
applying Procedure 6. Then T’ is a t-fair extension ofT. 

To detect acceptance cycles, DFS is extended with a cycle-check procedure 
(Fig. 3). Whenever Procedure 8 detects an accepting state, it starts Procedure 9, 
which is again a DFS, that reports an accepting state if the seed state is matched 
within the cycle-check. Here we omit a detailed description of the NDFS algo- 
rithm and refer the interested reader to [7]. 
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Procedure 8 (ndfsi(s, c)). 

add (s, c, 0) to S' 
if c = 0 and s e F 

then if (s, 1, 0) 4 S' then ndfsffs, 1); 
else 

for all s — si do 

if c > 0 and (a = set(t c , kf c ) or [t c ] s ^ kf ) 
then ci = (c + 1) mod N 
else ci = c; 

if (si,ci,0) 4 S' then ndfs 1 (s i,Ci); 
od; 

if c = 0 and s e F then seed := (s, 0, 1); ndfs 2 (s, 0); 



add a pair to the state space 
0-replica, and state s is accepting 
r-step from 0-replica to 1-replica 

for all transitions enabled in s 
t-fairness condition 
the next replica number 
the same replica number 
recursive call 

set the seed and start ndfs 2 



Procedure 9 (ndfs 2 (s, c)). 

add (s, c, 1) to S' 
if c = 0 and s e F 

then if (s, 1, 1) 4 S' then ndfs 2 (s, 1); 
else 

for all s si do 

if c > 0 and (a = set(t c , kff) or [t c ] s 4 k? c ) 
then ci = (c + 1) mod N 
else ci = c; 

if seed = (s, ci, 1) then REPORT CYCLE! 
else if (si,ci,l) 4 S' then ndfs 2 (si, ci); 
od; 



add a pair to the state space 
0-replica, and state s is accepting 
r-step from 0-replica to 1-replica 

for all transitions enabled in s 
t-fairness condition 
the next replica number 
the same replica number 
seed is matched, report the cycle 
recursive call 



Fig. 3. NDFS version of Procedure 6 



The correctness of the algorithm is given by the following claim: 

Theorem 10. Given an LTS T, Procedure 8 called with (s, n j t ,0) reports an 
acceptance cycle iff there exists a reachable t-fair acceptance cycle in T. 

Proof. Follows from the correctness of the NDFS algorithm from [7] by observing 
that the algorithm is actually NDFS from [7] applied on the extended state space 
V . ' □ 

The last result completes the series of claims that guarantee the soundness 
of the verification approach proposed in this paper. If no acceptance cycle is 
detected then the verified property holds for t-fair traces of the abstract system 
and therefore also for the concrete system. 

Time complexity of the NDFS Algorithm in Fig. 3 is 0(N ■ |T|), where N is 
the number of timers, while \T\ is the size (states and transitions) of the abstract 
system state space. Memory space needed to save T’ is virtually the same as the 
one for T. Instead of keeping each of the N replicas ( s,i ), (1 < i < N) one 
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can save only the “useful” part s plus additional 2 (N + 1) bits, like it is done 
for process fairness in Spin. The first N + 1 bits correspond to the replicas in 
the main depth first search of the NDFS algorithm, while the second group of 
(N + 1) bits corresponds to the nested DFS. If bit i of the first group is set then 
this means that the state (s, i ) has been visited by the algorithm. Similarly for 
the second group. As the description of s is usually much greater than 2(N + 1) 
bits, the bookkeeping overhead is negligible. 

4 T-Fairness in DTSpin 

DTSpin [3] is a discrete-time extension of Spin [15] that has all verification 
features of Spin. It was successfully applied for debugging and verification of 
timed models of industrial size protocols (see e.g. [4,16]). DTSpin is designed for 
the verification of systems where delays are significantly larger than the duration 
of the events within the system. Therefore, system transitions are assumed to be 
instantaneous. DTSpin employs the concept of timers to express time aspects of 
a system. In DTPromela, the input language of DTSpin, timers are modelled 
by variables of a predefined type timer. The data domain and the operations on 
timers are defined as in Section 2. 

Since the system transitions are assumed to be instantaneous, time progress 
has the least priority in the system and may take place only when the sys- 
tem is blocked. A special process Timer ticks all the active timers down in case 
the system is blocked. DTSpin employs Promela’s statement timeout to check 
whether the system is blocked. To ensure that time progression has the least 
priority, the usage of timeout is reserved for the implementation of time progres- 
sion and forbidden in DTPromela specifications. Note that by the definition 
of tick, all DTPromela models are deadlock-free. 

To implement the timer abstraction defined in Section 2, we extend DT- 
Promela with a new data type timer a for abstract timers and define the oper- 
ations on them as macros. The abstract version of tick , tick 01 , decreases values of 
active abstract timers if they are different from . If a timer has the kf value, 
the non-deterministic choice is made between decreasing the value of the timer 
to (kt — 1) and leaving it unmodified. Our fairness algorithm from Section 3 is 
implemented by means of a PAN2TFPAN Java program that transforms the pan 
verifier generated by Spin for the verification of the property without f-fairness 
into a new one that checks the property under f-fairness. The transformation is 
automatic and does not require any interaction with the user. 

The user applies thus the following scheme for the verification: (1) Choose 
timers of a concrete model that should be abstracted and define a kt value for 
each of those timers; (2) Redefine the type of the chosen times to timer 01 and 
redefine the set operations according to the k t values; (3) Check whether the 
abstract system is free from zero-time cycles, i.e. check whether tick happens 
infinitely often. This is done by checking LTL formula: □<> timeout. (In DT- 
Spin, time progresses if the statement timeout of Promela is true. Since this 
statement is forbidden to use in DTPromela specifications, OOtimeout ex- 
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presses the absence of zero-time cycles.) (4) Formulate the abstract version of 
the property to check and generate the pan verifier for this property; (5) Trans- 
form the pan verifier with PAN2TFPAN to the new pan verifier, which will check 
the property under the f-fairness condition. Positive verification results imply 
that the property holds for the concrete system as well. If the property gets 
violated on the abstract system, the counterexample is generated, and the user 
checks whether the counterexample is spurious or not. 



5 Experimental Results 

In this section we describe some experimental results that show the efficiency 
of our approach. Our test cases are the positive acknowledgment retransmission 
protocol (PAR) [23] and Fischer’s mutual exclusion protocol [19]. We compare 
the results obtained when we specify f-fairness as LTL formulas according to 
strong fairness and weak fairness patterns (we will refer to it as verifying with 
strong/weak fairness respectively) with the results obtained with our prototype 
implementation of the algorithm from Section 3 in DTSpin, which we refer to 
as built-in f-fairness. Our prime goal here is to compare the performance of the 
three methods rather than to verify the protocols. 



Experiments with the Positive Acknowledgment Retransmission Pro- 
tocol (PAR). PAR [23] is a classical example of a communication protocol 
where time issues are essential for the correct functionality of the protocol. PAR 
involves a sender, a receiver, a message channel and an acknowledgment channel. 
The sender receives a frame from the upper layer, sends it to the receiver via 
the message channel and waits for a positive acknowledgment from the receiver 
via acknowledgment channel. When the receiver delivered the message to the 
upper layer, it sends the acknowledgment to the sender. After the positive ac- 
knowledgment is received, the sender becomes ready to send the next message. 
The channels delay the delivery of messages. Moreover, they can lose or corrupt 
messages. Therefore, the sender handles lost frames by timing out. If the sender 
times out, it re-sends the message. As known, the protocol functions correctly 
only under the following condition: the timeout of sender should be greater than 
the sum of delays on channels. 

We specified PAR in DTPromela using concrete timers to represent delays 
on the channels and the sender timeout. Our goal was to check that if the 
channels do not lose messages continuously, no message reordering occurs and 
no message gets lost, under condition that the timeout of the sender is greater 
than the sum of the (given) delays on the channels. To prove the property for 
an arbitrary message sequence we used a well-known canonical abstraction [14, 
25] and defined two abstract environment processes: one representing an upper 
layer for the sender and another one for the upper layer of the receiver. Then we 
abstracted the sender’s timer to check the property for all values greater than 
the sum of the channels’ delays. 
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Table 1. PAR 



pattern 


states 


transitions 


memory(Mb) 


time 


strong fairness 


825761 


5.10962e+06 


52.286 


0:21.00 


weak fairness 


227569 


1.49527e+06 


15.320 


0:05.98 


built-in t-fairness 


100275 


390012 


6.693 


0:01.56 



Without f-fairness, the property gets violated, since there exists a trace where 
the abstract timer of the sender never expires, staying in the loop kf k+ 
(we obtained a i-unfair trace as counterexample). Under the f-fairness condition, 
we proved that the property holds. Table 1 contains information on the time 
and memory consumption for the verification with DTSpin of the property 
formulated with the strong and weak fairness patterns and for the verifier with 
built-in t-fairness. 



Fischer’s mutual exclusion protocol. Our second test example is Fischer’s 
mutual exclusion protocol. The protocol uses time constraints and a shared vari- 
able to ensure mutual exclusion in a system that consists of N processes running 
in parallel and competing for a critical section. We assume that each process has 
a unique id from 1 to N. The initial value of the shared variable a: is 0. When a 
process observes that x is 0, it waits for at most 8 i time units and then writes 
its id to x. After that, it waits for at least 82 time units, and if x still equals 
the process id , the process enters the critical section. The process stays in the 
critical section for some time and then leaves it. 

We have specified Fischer’s mutual exclusion protocol in DTPromela using 
concrete timers to represent delays not larger than and abstract timers to 
represent delays which are at least £ 2 - As known, mutual exclusion is ensured 
provided that <5i < 82 ■ We have checked the property that if there comes a 
request of access to the critical section, one of the processes will get it. Table 2 
contains results for strong, weak and built-in t-fairness for the case of two, three 



Table 2. Fischer’s mutual exclusion 



pattern 


num. of proc. 


states 


transitions 


memory(Mb) 


time 


strong fairness 


2 


41384 


171586 


4.363 


0:00.46 


weak fairness 


2 


4705 


13053 


2.724 


0:00.08 


built-in t-fairness 


2 


1236 


4181 


1.573 


0:00.01 


strong fairness 


3 


3.28599e+06 


2.01406e+07 


190.539 


1:01.79 


weak fairness 


3 


115874 


362068 


8.561 


0:01.22 


built-in t-fairness 


3 


21592 


110332 


2.700 


0:00.26 


strong fairness 


4 


out of memory 








weak fairness 


4 


2.60665e+06 


9.2549e+06 


151.729 


0:38.34 


built-in t-fairness 


4 


346903 


2.45733e+06 


20.927 


0:05.69 
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and four processes. Note that the number of abstracted timers in this example 
is equal to the number of processes. 

The experiments were done on AMD Athlon(TM) XP 2400+ with 1Gb of 
memory. In all experiments, the verification with built-in t-fairness took signifi- 
cantly less time and memory than the verification with strong and weak fairness 
patterns expressed as LTL formulas. The prototype implementation pan2tfpan 
and the models can be found at www.cwi.nl/~ustin/tfair.html. 

6 Conclusion 

In this paper we considered a timer abstraction that introduces a cyclic behav- 
ior on abstract timers that is not present at the concrete level. This could lead 
to spurious counterexamples for liveness properties. We showed how one can 
eliminate those by imposing a strong fairness constraint on the traces of the ab- 
stract model. Using the fact that the loop on the abstract timer is a self-loop for 
this abstract timer (though there is possibly no self-loop on the corresponding 
LTS) , we transformed the strong fairness constraint into a constraint which has 
a weak fairness pattern, and embedded it into the verification algorithm. Our 
experiments with the prototype implementation of the algorithm were encour- 
aging. We conjecture that the ideas in this paper can also be used for other data 
abstractions that introduce self-loops on the abstracted data. 
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Abstract. In Promela, communication buffers are defined with a fixed length, and 
buffer overflows can be handled in two different ways: block the send statement 
or lose the message. Both solutions change the semantics of the system, compared 
to one with unbounded channels. The question arises, if such buffer overflows can 
ever occur in a given system and what buffer lengths are sufficient to avoid them. 
We describe a scalable incomplete boundedness test for the communication buffers 
in Promela models, which is based on overapproximation and static analysis. We 
first reduce Promela models to systems of communicating finite state machines 
(CFSMs) and then apply further abstractions that leave us with a system of linear 
inequalities. Those represent the message sending and receiving effect that the 
control flow cycles of every process have on any message buffer. The test tries 
to establish the existence of a linear combination of the effect vectors so that at 
least one message can occur an unbounded number of times. If no such linear 
combination exists then the system is bounded. We discuss the complexity of 
this test and present experimental results using our implementation in the IBOC 
system. Scalability of the test is in part due to the fact that it is polynomial for the 
type of sparse control flow graphs derived from Promela models. Also, the analysis 
is local, i.e., it avoids the combinatorial state space explosion due to concurrency 
of the models. We also present a method to derive upper bound estimates for 
the maximal occupancy of each individual message buffer. Previously, we have 
applied this approach to UML RT models, while in this paper we focus on the 
additional problems specific to Promela code: determining the potential message 
types of any channel, tracking potential contents of variables, channels passed as 
arguments to processes, channel assignments, channel arrays and parallel process 
creation. 



1 Introduction 

In Promela. the input language of the SPIN model checker [7], inter-process communi- 
cation can be done via shared global variables or by message passing via communication 
channels that operate as first-in first-out (FIFO) buffers. These buffers are defined with 
a fixed length, and buffer overflows (i.e., an attempt to send a message to a full buffer) 
can be handled by SPIN in two different ways: block the send statement or lose the 
message. Both solutions change the semantics of the system, compared to one with un- 
bounded channels. The question arises whether such buffer overflows can ever occur in 
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a given system and what buffer lengths are sufficient to avoid them. Our paper presents 
an automated test for the occurrence of these buffer overflows in Promela. 

Of course, possible buffer overflows can be detected by simulation, or by encoding 
this question into an LTL model checking problem. However, this normally involves 
fully exploring the state space. Here we propose a type of boundedness analysis that 
avoids exhaustively checking all the computations of the model. We describe a scalable 
incomplete boundedness test for the communication buffers in Promela models, which 
is based on overapproximation and static analysis. For the test, we first interpret all 
communication buffers in the model as having unbounded length (instead of the fixed 
length in their definition) and then try to prove their boundedness, i.e., to establish 
upper bounds on the maximal reachable occupancy of every buffer. To do this, we first 
reduce Promela models to systems of communicating finite state machines (CFSMs) and 
then apply further abstractions that leave us with a system of linear inequalities. Those 
represent the summary message sending and receiving effect that the control flow cycles 
of every process have on any message buffer. The test tries to establish the existence of a 
linear combination of the resulting effect vectors so that at least one message can occur 
an unbounded number of times. If no such linear combination exists then the system is 
bounded. By similar techniques it is also possible to derive upper bound estimates for 
the maximal occupancy of each individual message buffer. 

Our test is: (i) Incomplete: Since boundedness for systems of CFSMs is undecid- 
able [4] we work with an overapproximation of the Promela model. Hence, not every 
instance of a bounded system can be detected, (ii) Safe: If our test returns the result 
‘bounded’ for the overapproximation then the original Promela model is also bounded. 
The computed upper bounds for maximal occupancy of each individual message buffer 
also carry over to the original Promela model, (iii) Scalable: Scalability of the test is in 
part due to the fact that it is polynomial for the type of sparse control flow graphs derived 
from Promela models. Also, the analysis is local, i.e., it avoids the combinatorial state 
space explosion due to concurrency of the models. 

In precursory work [10] we have successfully applied this approach to bounded- 
ness checking of communication channels in UML RT [14,15] models, using our im- 
plementation in the IBOC (IMCOS Boundedness Checker) tool that we are currently 
developing. Promela differs from UML RT in a number of important aspects. In UML 
RT the different parallel processes in the system are represented by so-called capsules 
which communicate with each other only by message passing. These message passing 
channels are a priori assumed to be unbounded and the topology of the communication 
structure is defined statically at compile time. The capsule behaviors are defined through 
hierarchical state machines whose transitions are triggered solely by message-receive 
events. These transition can also be labeled with arbitrary programming language code 
(which we abstract from in our UML RT analysis). Promela, on the other hand, is a 
concurrent programming language with concurrent processes, referred to as proctypes. 
It’s control structure is much more flexible and versatile than that of UML RT, although 
state machines can easily be modeled in Promela. As opposed to UML RT, in Promela 
communication between proctypes can be via message passing or shared variables, and 
the communication topology can be dynamically changed. 

However, once the static code analysis is completed and the message passing effect 
vectors have been determined, the boundedness analysis for the Promela case is identical 
to the analysis in the UML RT case. The focus of this paper is therefore on the specific 




218 



S. Leue, R. Mayr, and W. Wei 



problems that have to be addressed when analyzing Promela code in order to determine 
the message passing effect vectors. Issues that we will consider include 

- the identification of message types, since in receive statements these can be referred 
to by variables whose values are not statically known; 

- the passing of channel names as formal parameters during proctype instantiation; 

- the replication of identical proctype instances; 

- the assignment of channel variables; 

- the use of channel array data structures where the arrays are indexed by variables 
not known statically; 

- and the impact of unbounded proctype creation. 

Paper Outline. For the sake of self-containedness of this paper we review the principle 
of our boundedness test in Section 2. In Section 3 we describe the solutions to the specific 
issues in the application of the analysis to Promela. Experimental results are discussed 
in Section 4. We discuss related work in Section 5 and conclude in Section 6. 



2 Boundedness Analysis 

For the sake of self-containedness of this paper we now summarize the general principle 
of our boundedness analysis. A more detailed description can be found in [10]. 

First, we consider a sequence of conceptual abstractions for Promela models. In 
every step we obtain a coarser overapproximation of the previous model, for which 
the boundedness problem is easier to solve. All behavior of the original system is also 
possible in the overapproximations, i.e., they are monotonous w.r.t. simulation preorder. 
Furthermore, the abstractions preserve the (upper bounds on the) number of messages in 
every communication channel (buffer) of the Promela model. In practice, in the IBOC 
tools, all these abstractions are done in a single step. 

Level 0: Promela code. We start with the original system model described in Promela, 
except that we a priori assume that buffers have arbitrary length. For this model (Promela 
with arbitrary length buffers) boundedness is, of course, undecidable, since the buffers 
could be used to simulate a Turing-machine tape. 

Level 1: CFSMs. First, we abstract from the general program code in the model, i.e., 
variables, arithmetic, etc. We retain only the finite control structure of the program and 
the message passing behavior via unbounded buffers representing the communication 
channels. We obtain a system of communicating finite-state machines (CFSMs), some- 
times also called FIFO-channel systems [1], For the CFSM model boundedness is also 
undecidable [4]. 

Level 2: Parallel-Composition-VASS. In the next step we abstract from the order of the 
messages in the buffers and consider only the number of messages of any given type. For 
example, the buffer with contents abbacb would be represented by the integer vector 
(2, 3, 1), representing 2 messages of type a, 3 messages of type b and 1 message of type 
c. Also we abstract from the ability to test explicitly whether a given buffer is empty. 
We so obtain a vector addition system with states (VASS) [3]. More exactly, we obtain 
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a parallel-composition-VASS. This is a VASS whose finite-control is the parallel com- 
position of several finite automata. Each part of this parallel composition corresponds 
to the finite control of some part of CFSM of level 1 , and to the finite control of a pro- 
cess in the original Promela model. (Parallel-composition-VASS are as expressive, but 
more succinct than normal VASS.) The boundedness problem for parallel-composition- 
VASS is polynomially equivalent to the boundedness problem for Petri nets, which is 
EXP SPA CE-complete [17], 

Level 3: Parallel-Composition-VASS with Arbitrary Input. We now abstract from activa- 
tion conditions of cycles in the control-graph of the VASS and assume instead that there 
are always enough messages, represented by tokens, present to start the cycle. As far as 
boundedness is concerned, we replace the problem ‘Is the system bounded if starting at 
the given initial configuration?’ by the problem ‘Is the system bounded for any finite ini- 
tial configuration?’, also referred to as the structural boundedness problem. It has been 
shown in [ 10] that this structural boundedness problem for parallel-composition-VASS 
is co-A^P-complete, unlike for standard Petri nets where it is polynomial [12,6]. 

Level 4: Independent Cycle System. Finally, we abstract from the fact that certain cycles 
in the control graph depend on each other. Instead we assume that all cycles are inde- 
pendent and any combination of them is executable infinitely often, provided that the 
combined effect of this combination on all places is non-negative. The nnboundedness 
problem for this abstracted model then becomes the following question: Is there any 
linear combination (with non-negative integer coefficients) of the effects of simple cy- 
cles in the control graph, such that the combined effect is non-negative on all places 
and strictly positive on at least one place? Since we consider an overapproximation, 
the original Promela model is surely bounded if the answer to this question is ‘no’. 
Since these effects of simple cycles can be represented by integer vectors, we get the 
following problem. Given a set of integer vectors, does there exist a linear combination 
(with non-negative integer coefficients) of them, such that the result is non-negative in 
every component and strictly positive in at least one. This problem can be solved in time 
polynomial in the number of vectors by using linear programming techniques. 

However, the important aspect is that the time required is only polynomial in the 
number of simple cycles, unlike at level 3, where the problem is co-AGP-hard even for a 
linear number of simple cycles. This is very significant, since for instances derived from 
typical Promela models, the number of simple cycles is usually small. This is because 
the typical control-flow graphs of Promela code are (like in programming languages) 
sparse and often very local. (This is also the general reason why caching works.) Thus 
the number of different simple cycles derived from this code is typically polynomial 
rather than (in the worst case) exponential. 

Overall Boundedness Test. From every simple cycle found in the control structure, a 
vector can be derived which describes its effect on the unbounded system part. Here, for 
the Promela model, the vector describes how many messages were altogether added to 
the buffer. For every buffer and every message type there is one component in each of 
the effect vectors. The component can be negative if in the cycle more messages of this 
type were removed from a buffer than added to it. The resulting semilinear system is 
unbounded if and only if there exists a linear combination with non-negative coefficients 
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of the effect-vectors that is non-negative in every component and strictly positive in at 
least one component. Formally, this can be described as follows: Let v \, . . . , v n £ 7L k 
be the effect-vectors of all simple cycles and let v 3 be the j-th component of the vector 
v. The question then is 



n / n \ 3 

3xi, ...,x n G 1N 0 . ^2 XiVi > o A 3j. I ^ x&i 1 > 0. 

i= 1 \i= 1 / 

This can easily be transformed into a system of linear inequations and solved by stan- 
dard linear programming tools. If this condition is true then our overapproximation is 
unbounded, but not necessarily also the Promela model. The unboundedness could sim- 
ply be due to the coarseness of the overapproximation. On the other hand, if the condition 
above is false, then our overapproximation is bounded, and thus our original Promela 
model is also bounded. Thus, this test yields an answer of the form “BOUNDED” in 
case no linear combination of the effect vectors satisfying the above constraint can be 
found, and “UNKNOWN” when such a linear combination exists. 




Fig. 1. Effect Graphs of a 2-Process Model 



Example. Figure 1 describes effect graphs obtained from two communicating processes. 
The left process sends messages ‘a’ or ‘b’ to the right one, and the right process sends 
messages ‘c’ to the left one. The three components of the vector describe how many 
messages ‘a’, ‘b’ or ‘c’ are written (positive values) or read (negative values) in a step. 
For example, in the step from s '2 to S 3 two messages ‘a’ and one message ‘b’ are written 
and one message ‘c’ is read. From this graph we obtain the effect vectors v\ = (4, 1, —2) 
and V 2 = (—1, — 1, 1) for the simple cycles. To represent the > 0 condition in the linear 
inequation system we add a constraint 3xi — x% > 1. The linear inequation solver 
returns infeasibility of this system of inequations, and we thus conclude a result of 
“BOUNDED”. 

Computing Bounds for Individual Buffers. A more refined problem is to compute upper 
bounds on the reachable lengths of individual buffers in the system. In particular, some 
buffers might be bounded even if the whole system is unbounded. Since normally not 
all buffers can reach maximal length simultaneously, the analysis is done individually 
for each buffer B. This can be done by solving a linear programming problem that 
maximizes a linear target function //,- (length of B) on an abstraction level 4 system 
description. The basic idea is the following. Let p be a path from the initial configuration 




A Sealable Incomplete Test for Message Buffer Overflow 



221 



to a configuration where B has maximal length. Then p can be decomposed into a cyclic 
part p c and an acyclic part p a . Since at abstraction level 4 we assume total independence 
of simple cycles, the effect of p c can be described by a linear combination of the vectors 
describing the effects of simple cycles. It thus suffices to maximize fn on this linear 
combination. Determining the maximal contribution of the acyclic part p a is harder since 
one has to consider all possible combinations of acyclic paths in all parallel processes. 
(This is generally exponential in the number of parallel processes.) Therefore we only 
compute an upper bound on the effect of p a as follows: Let n be the number of parallel 
processes and p l a the part of p a in the i-th process. Let E(x) be the effect vector of path 
x. Then E(p a ) := ^” =1 E(p z a ) . Now we compute vectors r t which are upper bounds on 
E(p l a ), i.e., Vp' a . r'i > E(p l a ). The r,; are computed (in polynomial time) by maximizing 
individually every component of the possible effect of paths p l a . (For example, if in 
process i there are two acyclic paths with effects (3, 1) and (1, 2) then r, = (3, 2).) It 
follows that E(p a ) = E (Pa ) < Y!i= 1 r i and thuS E \p) = E i.Pc) + E(p a ) < 

E{ P c) + l D • It only the remains to solve the linear optimization problem of Jb on 
E(jp) over E(p c ) as explained above. 



Example. Having established boundedness of the example of Figure 1 , we now compute 
the estimated upper bound for each buffer. First we compute the effect vectors for all 
non-cyclic paths. They are listed in Table 1 where init and init' are the initial states of the 
state machines. Then we take the maxima of the individual components from those effect 
vectors and construct the overapproximated maximal effect vectors for process Left as 
tt = (2,5,0) and for Right as r 2 = (0,0,2). Thus the sum is )T)™ =1 r * = (2,5,2). 
We obtain the following two optimization problems (1-4 and 5-8) for the two buffers 
left-to-right and right-to-left: 



m ax : 2 — 


2xi + X 2 


(I) 


max : 7 + 5x± — 2;r2 


(5) 


2 T Ax i 


- x 2 > 0 


(2) 


2 T Ax\ — x 2 ^ 0 


(6) 


5 + x\ 


— X 2 > 0 


(3) 


5 + Xi — x 2 > 0 


(7) 


to 

1 

to 


L x 2 ^ 0. 


(4) 


2 — 2x\ + x 2 > 0. 


(8) 



Linear Programming returns a value of 6 for the objective function (1) and a value of 
18 for the objective function (5). These values represent the estimated bounds for the 
communication buffers 1 and 2, respectively. 



Table 1 . The Effect Vectors for all Non-Cyclic Paths in Figure 1 



The non-cyclic path 


The effect vectors 


The non-cyclic path 


The effect vectors 


< init , si > 


(0,0,0) 


< init , si, s2 > 


(0,2,- 1) 


< init. si, s2, s3 > 


(2, 3, -2) 


< init, si, s3 > 


(0,5,-l) 


< init. si, s3, s2 > 


(2,5, -2) 


< init', s4 > 


(0,0,2) 






< init' , s4, s5 > 


(-1,0,2) 
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3 Promela-Specific Issues 



Given a Promela model the first step in the model analysis is to extract from it a system of 
CFSMs that consists of all state machines of all potentially executing processes. SPIN can 
automatically generate a state machine representation for every proctype definition in the 
model 1 . The state machine of each actually instantiated process is then obtained from the 
state machine of its process type by replacing all formal arguments by the corresponding 
actual arguments. Figure 2 shows a simple Promela model and the corresponding system 
of CFSMs. There are two process instances at run time, one of process type P and the 
other of process type Q. Each transition in the state machine representation corresponds 
to a basic statement in the Promela code 2 of its process type. The source state of the 
transition denotes the entry point of the corresponding statement and the target state 
denotes the exit point of the statement. Every transition has a guard which is determined 
by the implicit executability condition of the corresponding statement. The executability 
of a statement describes under which condition the statement is executable. For instance, 
the transition from the state 4 to the state 3 is labeled by its corresponding statement 
Cl msgl. The statement is executable if and only if there is a message msgl available 
in the channel C. 



mtype = { msgO, msgl 
chan C = [2] of { mtype 
active proctype P(){ 

C ImsgO; 
do 

: : C?msgl -> ClmsgO 
od> 

active proctype Q(){ 

C !msgl ; 
do 

: : C?msgO -> Clmsgl 
od} 




Fig. 2. A simple Promela model and the corresponding state machines. 



We apply code abstraction to the resulting system of CFSMs. We replace all the 
statements in the state machines by their message passing effects. The resulting system 
is called the system of effect graphs. The remaining steps of the analysis, including cycle 
detection and translating the summary effects of all the cycles into a linear programming 
problem, are not different from the corresponding steps for UML RT models as described 
in [10]. In the remainder of this section we discuss several issues that have to be addressed 
during the Promela code abstraction in order to extract effect vectors from the Promela 
code that ensure an overapproximation of the system. The resolution of these issues 
greatly influences the resulting systems of effect graphs, in particular how coarse the 
overapproximation is. 



1 This is accomplished by invoking the spin -d option. 

2 A basic statement is defined in the Promela language as an indivisible statement such as an 
assignment statement or a receive statement. 
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3.1 Identifying Message Types 

Each component of an effect vector corresponds to a distinct message type. To construct 
effect vectors we must determine all distinct message types occurring in the model. 
Before we discuss this problem, we first review the syntactic structure of messages and 
the message passing semantics in Promela. The structures of the messages are defined 
in the declarations of the channels that store them. Consider the following channel 
declarations: 

chan Cl = [2] of { mtype, int, bool chan C2 = [2] of { int, bool }-; 



Messages can contain any finite number of fields. Each field can be of any well-formed 
type except arrays. In particular, the channel type chan is allowed. There is a special type 
called mtype that contains user-defined symbolic constants. In the Promela model in 
figure 2 mtype is declared as containing two constants: msgO and msg 1. In any model, 
there can be at most one mtype declaration. A mtype field is not necessarily contained 
in every message. For instance, the messages in the channel C 2 only contain an integer 
field and a boolean field. 

Consider a send statement of the form Ch\ei,e 2 , . . . , e„ where each e* (1 < i < n) 
can be an arbitrary Promela expression. The statement is executable if the channel Ch 
is not full. Nevertheless, under our a priori assumption of unboundedness of chan- 
nels, the send statement is never blocked. The send statement sends the message 
(val(e\),val(e 2 ), ...,val(e n )) to the channel, where val(ei) is the run-time value of 

e*. 

For a receive statement Chle i, e 2 , ..., e„ 3 , each e* (1 < i < n) can be a variable, 
the evaluation eval(v) of some variable v, or a constant, eval(v) can be regarded as 
a constant. But its value is unknown statically. The statement is executable if all the 
constant expressions match the relative fields in the available message. Otherwise, it is 
blocked. If e, is some variable v then, if the statement is executable, the corresponding 
field of the received message is stored into v. 

The constants in a receive statement distinguish among messages containing different 
values of the relative fields. This allows users to associate with each message a proper 
piece of code for manipulation. We observe that these constants are often of type mtype. 
The variables in a receive statement can never block the execution. They are used to 
retrieve the relative fields from the received message. Consider the Promela model in 
Figure 3. The two receive statements in the model use constants of type mtype to 
discriminate between the messages of type exact and inexact. The variable x stores the 
data upon receiving, no matter whether the received message type is exact or inexact, 
and no matter which integer is transmitted with the message. 

For the purpose of our analysis one can regard two messages which have the same 
structure but disagree on at least one field as being of different types. However, the 
number of the message types recognized in this way can be very large. It is also not 
necessary based on the following observation. In the previous example, the messages 

3 Promela knows additional forms of the send and receive primitives, denoted by !! and ??, 
respectively. They are variants of the basic send and receive statement and only differ in the 
order in which they add and remove messages from the communication channels. Since our 
analysis abstracts from the order of messages in the buffer anyway, we do not distinguish these 
primitives here and map them to their respective base form. 
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mtype = {exact , inexact} 
chan C = [2] of {mtype, int} 
active proctype P(){ 
int x; 
do 

:: C?exact, x -> keep(x) 

:: C?inexact, x -> dump(x) 
od 

} 



Fig. 3. Receive statements 




Fig. 4. Nondeterministic ef- 
fect graph 



mtype = {msgO, msgl}; 
chan C = [2] of {mtype}; 
chan D = [2] of {mtype}; 
proctype P(chan X; chan Y){ 
do 

: : X?msgO; YlmsgO; YlmsgO 
od} 

init{ run P(C, D) ; 

run P(D, D)} 



Fig. 5. Channel parameters. 



(exact, 5) and ( inexact , 5) must be of different types because the model discriminates 
between them. On the contrary, the messages (exact, 5) and (exact, 7) need not to be of 
different types since the model treats them in exactly the same way. Only the first field 
of the messages is used to indicate the message types. 

Based on the discussion so far, we propose the following solution to identify message 
types. For any channel Ch that stores messages with mtype fields, we identify the types 
of the messages in Ch as pairs (Ch, mconst) where mconst is an mtype constant. We 
include the channel name into message types because more than one channel can store 
messages with the same structure, and messages exchanged in different channels must 
be distinguished. For any channel Ch that stores messages without mtype field, there is 
only one type, denoted as (Ch), for all the messages in Ch. In this way there are two 
message types identified for the model in Figure 3: (C, exact) and (C, inexact). This 
solution simply abstracts away all the non -mtype fields of messages. 

This abstraction approach becomes coarse if, in any model, some constants in some 
receive statements are of other types than mtype. For instance, if we have a receive 
statement Cl(exact, 5) in the model in Figure 3, the model discriminates between the 
messages containing the integer 5 and the messages containing other integers. Then the 
previously identified types are not sufficient to distinguish messages such as (exact, 5) 
and (exact, 7). Thus, for any channel Ch and receive statement Chle i, e 2 , ..., e*, ..., e„, 
where e, is a constant c, , those messages containing c, in their i- th fields and those 
messages containing other constants in their i-th fields must be of different message 
types. 

We propose a finer abstraction as follows. For any channel Ch that is declared 
as chanCh = [k] of {t.ype 1 , type 2 , ..., type n }, the message types are those (Ch,< 
d\, d, 2 , ..., d n >) such that 

- Di is the domain of the type type j. 

- P(Di) is a partition over Di and P(Di) = {{const{\, {const 2 }, ..., {const m }, 
Di — U 'jZ™ {const. j}} such that each constant const j £ Di (1 < i < m) is the i-th 
expression in some receive statement Chle \, e 2 , ■ ■■, const j, ..., e n . 

- di £ P(Di). 



For the example in Figure 3 augmented with the receive statement Cl exact, 5 , assuming 
/ as the integer domain, we then have four message types as follows: (C, < exact, {5} >), 
(C, < exact, I — {5} >), (C, < inexact, {5} >) and (C, < inexact, I — {5} >). 
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3.2 Tracking mtype Variables 

If an expression in a send or receive statement is a variable or the evaluation of a variable, 
its run-time value is not known statically. This affects our abstraction when the relative 
field of messages is used to identify the message types. For instance, assume that we 
have a send statement C\x, y in the model in Figure 2 where £ is a mtype variable and y 
is an integer variable. We identify all the message types as (C, exact) and (C, inexact). 
In other words, the integer field of messages is abstracted away. Which type the sent 
message has depends only on the variable x. Since we can not generally determine 
the run-time values for x, we model the statement as sending nondeterministically any 
message whose mtype field is either exact or inexact. Therefore, in the resulting effect 
graph, there are two transitions. Both are leaving from the state corresponding to the 
entry point of the statement and lead to the state corresponding to the exit point, as shown 
in Figure 4. The left transition is labeled by the effect (1, 0) denoting that a message of 
the type (C, exact) is sent. The right transition is labeled by the effect (0, 1), denoting 
that a message of the type (C, inexact) is sent. Thus one obtains a nondeterministic 
choice between two transitions, as shown in Figure 4. 

As mentioned before, there is at most one mtype declaration in a model. All constants 
of type mtype can be used by any send or receive operation on any channel. However, 
most channels usually use only a small portion of all mtype constants. If we can determine 
for each channel the range of the mtype constants it uses, we only need to consider 
those constants in the range for the nondeterministic modeling of message passing. So 
we could obtain a finer overapproximation. Our approach works by statically tracking 
possible values of mtype variables, mtype constants are numerical symbols. The Promela 
language allows for arithmetic operations on mtype constants or variables. If mtype = 
{msgO, msg 1}, msg 0 and msgl are internally represented by the compiler as integers 
2 and 1, respectively. The statement v = msgO + msg 1 is syntactically valid and the 
mtype variable v is assigned the value 3. Note that this value is outside the range of 
the integers for representing a mtype constant in this example. However, Spin does not 
report such a range error. Arithmetic operations over the mtype domain make it extremely 
hard to track mtype variables. Due to these reasons, we exclude the usage of arithmetic 
operations over mtype from our analysis. Hence there are only three ways to change 
the value of a mtype variable: through an assignment, through a receive statement, or 
through argument passing. 

We propose a solution to determine the ranges for the channels for the coarser 
approach of identifying message types. That means all non -mtype fields of messages 
are abstracted away. The rules for updating the ranges for mtype variables and channels 
are given as follows, where V\ and v 2 are mtype variables, mconst is a mtype constant, 
and Ch is a channel: 

- Initially all the mtype variables and channels have the empty set as the ranges for 
them. 

- For the assignment v\ = mconst, add mconst to the range of V\. 

- For the assignment v\ = v 2 , add all the constants in the range of v 2 to the range of 

Vi. 

- For the send statement Ch\e i, ..., Vi, ..., e n , add all the constants in the range of v\ 
to the range of Ch. 

- For the send statement Ch\e i, ..., mconst...., e n , add mconst to the range of Ch. 
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- For the receive statement Chlei , iq, e n , add all the constants in the range of 
Ch to the range of V\ . 

-Assume a process type defined as proctype P(...\mtype tq;...). For run 
P(...,v 2, ••■)’ a dd all th e constants in the range of 11-2 to the range of v\. 

-Assume a process type defined as proctype P(...\mtype tq,...). For run 
P(..., mconst , ...), add mconst to the range of rq. 

After determining the range of a channel Ch , we may reduce the number of distinct 
message types. For instance, if the domain of the mtype constants is D and the range 
of Ch is D c h C D, then any message type ( Ch , mconst) can be discarded if mconst £ 
D — D ch . 

3.3 Channel Arguments 

In Promela models, process types can be parameterized. For any process type that has 
formal arguments as channels, its instances with different instantiations of the channel 
arguments have different message passing behaviors. Consider the Promela model in 
Figure 5 , where two running instances of process P are created. We refer to them as pi 
and P2 . The process p \ . instantiated as P(C. D), accepts two different channels as the 
actual arguments. The process P2 instantiated as P( 1 ). D) accepts the same channel D 
for both the formal arguments A' and Y. We can easily observe that p\ alone does not 
cause any unboundedness, while p 2 floods the channel D by messages msg 0 . 



3.4 Replication of Proctypes 

As shown above, different instances of some procedure with different channel arguments 
differ w.r.t. the boundedness of the channels. Flowever, several parallel instances of 
some proctype with the same channel arguments do not contribute more to a potential 
unboundedness than just one, as far as our analysis is concerned. This is because in our 
abstraction level 4 (see Section 2 ) we assume all control-flow cycles to be independent. 
Thus two parallel copies of a procedure do not contribute more different control-flow 
cycles than just one. 

3.5 Channel Assignments 

The channel names specified in a Promela model are actually variables of the type chan. 
At run time. Spin maintains a set of actual channels called queues, and each channel 
variable keeps a pointer to a specific queue. The queue pointed to by a channel variable 
can be changed through channel assignments. Consider the Promela model in Figure 
6 . The channel C and D initially point to two separate queues. After the assignment 
C = D, C points to the same queue as pointed by 1 ). It’s easy to see that this queue is 
flooded by messages msgO. 

A simple abstraction works as follows. Wherever we find a channel assignment 
Chi — CJ>2 in the model, we merge the channels Chi and Ch> into a single channel. 
That means one does not discriminate between the messages in Chi and the messages in 
Ch2 - This is an overapproximation since we abstract from message orders. This solution 
is relatively coarse because a channel assignment does not necessarily affect all parts of 
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mtype = { msgO, msgl 
chan C = [2] of { mtype }; 
chan D = [2] of {. mtype }; 
active proctype P () -C 
C = D; 
do 

:: C?msgO; DlmsgO; DlmsgO 
od} 



Fig. 6. A Promela model 



mtype = {. msgO, msgl }; 
chan C = [2] of { mtype 
chan D = [2] of { mtype }; 
active proctype P(){ 
do 

:: C?msgO; DlmsgO; DlmsgO 
od; 

C = D;> 



Fig. 7. A Promela model 



the model. Consider the Promela model in Figure 7. Apparently the channel assignment 
C = D does not affect the loop in the process type P where C and D still point to 
separate queues. 

We propose a finer overapproximation based on the notion of strongly connected 
components (SCCs). A SCC in a directed graph is a subgraph in which any vertex is 
reachable from any other vertex. If we collapse all the vertices in the same SCC into a 
single vertex, we obtain a directed acyclic graph (DAG). In the DAG each vertex denotes 
a SCC in the original graph. Each transition from the state SCC\ to the state SCC-i 
corresponds to a transition in the original graph from one of the states in SCC\ to one of 
the states in SCC 2 - We derive the DAGs from the state machines of the running processes 
that contain channel assignments. It’s obvious that a channel assignment in some SCC 
can only affect those SCCs reachable from it in the DAG of SCCs. For parallel processes, 
a channel assignment in one process can affect every part of every other process running 
in parallel. In this setting the effect vectors are constructed with separate components 
for different channels. However, at program locations where two channels are possibly 
identical, messages are nondeterministically sent to either channel. This encoding has 
the same effect as the unification of channels described above. 

3.6 Channel Arrays 

A set of channels can be declared as an array, e.g., chan C[3] = [5] of { mtype}. The 
channel array C consists of three channels indexed by integers between 0 and 2. For 
instance, the statement C[\\\msg sends a message to the channel indexed at 1. When an 
index is a variable, its value is generally not known statically. For instance, the statement 
C[i}\msg uses an integer variable i to index the array element. A simple solution is to 
model the statement as nondeterministically sending the message to any element of C, 
assuming that the run-time values of i always fall inside the range of the channel array 
indices. A finer approach is to statically track the index variable i to determine its range 
in a similar way as tracking mtype variables. However we cannot exclude arithmetic 
operations over integers. Whenever an arithmetic expression is met in an assignment, or 
in a receive statement, or in an argument passing, we have to set the range of the affected 
variable to the range of the channel array indices. 

3.7 Unbounded Process Creations 

The SPIN model checker limits the number of parallel processes to an implementation 
dependent constant which is in most installations 255. If one takes such a limit for 
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granted then process creation alone could not lead to channel unboundedness. However, 
the Promela language could just as well be interpreted without this limitation. Here we 
show how unbounded parallel process creation could lead to channel-unboundedness 
(even in the absence of cycles in the control-flow graphs), and how our method could 
handle this problem. 

Unbounded process creations can result in unbounded channels. There are two kinds 
of unbounded process creations. One kind is through local loops as demonstrated by 
the Promela model in Figure 8. The instance p of the process type P repeatedly creates 
instances of the process type Q. There is an execution where every new instance of Q 
immediately sends a message msgO to the channel C after p creates it, and then stops 
there for p to create another new instance of Q. This floods C with messages msg 0. 



mtype = { msgO, msgl 
chan C = [2] of { mtype }; 
proctype Q(){ 

C ImsgO; 

C?msgl ; } 

active proctype P(){ 
do 

: : run Q() 
od;} 




Fig. 8. A Promela model (left), the state machine of a running process of Q (middle), and the 
modified state machine with replication transitions (right). 



The unboundedness of C can not be detected from the state machine of any instance 
of Q as shown in the middle of the figure. A straightforward solution is to add an 
extra backward transition from each state in the state machine to the initial state. These 
transitions are called replication transitions. The modified state machine is on the right 
in the figure. Now we can determine the loop (/, 1,1) as the cause of the flooding of 
channel C. 

The other kind of unbounded process creations is through self-creations or mutual 
creations. An example for self-creation is the Promela model in Figure 9. The channel 
C is unbounded because any new instance of P is self-created and every instance sends 
a message to the channel. The unboundedness is not detected from the state machine 
of any running process of P as shown in the middle of the figure. Similarly we use 
replication transitions to detect self-creations. But we only need to add a replication 
transition for those states corresponding to the entry points of self-creations. So in the 
modified state machine on the right of the figure, there is no backward transition from 
state 2. The execution of any instance of P can never reach there before it creates a new 
instance. 

Now consider the situation that the unbounded process creations are through mutual 
invocation, as demonstrated by the Promela model in Figure 10. An instance of P creates 
an instance of Q that creates an instance of R. This instance of R creates in turn another 
new instance of P. The channel C is flooded with messages msgO at run time. 
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mtype = { msgO, msgl 
chan C = [2] of { mtype 
active proctype P(){ 
ClmsgO; 
run P()> 




run P() 

V 





Fig. 9. A Promela model (left), the state machine of a running process of P (middle), and the 
modified state machine with replication transitions (right). 



One way of dealing with this is to use replication transitions again to detect the 
unboundedness caused by mutual creations. But for any process, the replication transition 
would not target its own initial state since it is not self-created. Instead, for instance, 
the replication transition from the state 1 of the state machine of a P instance transfers 
the control to the initial state of the state machine of a Q instance. In this way, several 
previously independent state machines are united to one much larger state machine. 
Thereby our boundedness analysis would no longer be local to individual processes, but 
had to consider the complete system, which would significantly increase its complexity. 
In fact, unlike before, the time required would then be exponential in the number of 
parallel components, even if each individual component contains only a polynomial 
number of simple cycles. 



mtype = { msgO, msgl }; 
chan C = [2] of { mtype }-; 
proctype P(){ 

ClmsgO; 
run Q()> 
proctype Q(){ 

C?msgl ; 
run R()} 
proctype R(){ 

C Imsgl ; 
run P()> 

P Q 

Fig. 10. A Promela model (left), the united state machines with replication transitions (right). 




A second solution would be to add local replication transitions, just like in Figure 9, 
in every process which can, directly or indirectly, call itself. This is a safe, but coarser, 
overapproximation of the solution in Figure 10. The advantage is that one avoids inter- 
process transitions and thus keeps the boundedness analysis local and efficient. 



4 Experimental Results 

In this section we present experimental results using a prototype implementation of our 
analysis algorithms in the IBOC tool. As case studies we use the 2-Proctype model that is 
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given in Figure 1 1, a Promela model of the Alternating Bit Protocol, and a Promela model 
of the CORBA General Inter-ORB Protocol (GIOP) [9] 4 . IBOC uses to LPSOLVE tool 
for the linear programming tasks. All experiments were performed on a two processor 
1GHz Pentium III PC with 2 GB of memory. Table 2 gives some statistics regarding 
the complexity of these models as well as the computational effort for the analysis with 
IBOC. 



mtype = { c, b, a 
chan AB = [25] of { mtype 
chan BA = [25] of ■( mtype }•; 
active proctype A(){ 
si: if 

: BA?c -> AB!b; AB!b; 

: BA?c -> AB!b; AB!b; 
fi; 
s2 : if 

: BA?c -> AB!a; AB ! a; 



: : BA?c -> AB!a; AB ! a; 
fi> 



goto s2 

AB ! b ; AB ! b ; AB ! b ; goto s3 



AB ! b ; goto s3 






active proctype B(){ 

BA!c; 

BA!c; 
goto s4; 
s4: if 

: : AB?a -> goto s5 

fi; 
s5: if 

: : AB?b -> BA ! c ; goto s4 
fi;> 



Fig. 11. The 2-Proctype Promela Model. 



Table 2. Model Complexity Statistics and Computational Effort for Analysis 





2-Proctype 


Alternating Bit 


CORBA GIOP 


Processes 


2 


3 


5 


States 


20 


12 


135 


Transitions 


21 


22 


163 


Message types 


3 


8 


108 


Channels 


2 


4 


11 


Reported cycles 


3 


13 


2638 


Generated vectors 


2 


13 


752 


Runtime for cycle detection [sec.] 


0.062 


0.218 


7.187 


Runtime for boundedness check [sec.] 


0.000 


0.031 


0.098 


Runtime for computing bounds [sec.] 


0.032 


- 


1.265 



In IBOC, the full range of abstractions as discussed in Section 3 is not yet imple- 
mented. We use the finer abstraction proposed in Section 3.1 to identify message types. 
We are tracking mtype variables as discussed in Section 3.2. Since we can not exclude 
arithmetic operations over the integer domain, whenever we encounter an arithmetic 
expression in an assignment to a variable, we set the integer domain as the range of the 
variable. We collect all process creation statements to record channel argument passing 
for each running process. We adopt the coarser of the abstractions proposed in Section 3.5 
to deal with channel assignments. We use some tracking of integer variables to narrow 
down channel array index values. We don’t consider the unbounded process creation 
problem at all since SPIN allows no more than 255 concurrent proctype instances. 

4 The Promela sources for the IBOC model that we use are freely available from URL 
http://tele.informatik.uni-freiburg.de/leue/sources/giop/giop-sttt.tar. 
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IBOC returned a result of "BOUNDED" for the 2-Proctypes model. The estimated 
bounds were 20 for the channel AB and 6 for the channel BA which allows us to reduce 
the channel size compared to the values in the original model. This entails a significant 
state space reduction. Note that neither the boundedness result nor the estimated bounds 
could easily be derived from the model by manual inspection. 

"UNKNOWN" was returned for the model of the Alternating Bit Protocol. When 
such a verdict is returned, IBOC indicates control flow cycles that possibly contribute 
to the unbounded growth of a channel. For this model IBOC identified the cycle in 
which the sender sends messages to the system as a potential source of unboundedness. 
This is quite plausible since in the sender can flood the Alternating Bit protocol in an 
unconstrained fashion. In the sequel we will refer to the potentially unbounded cycles 
that IBOC identifies as counterexamples. 

The GIOP model is a real-life communication protocol with significant complexity. 
It amongst other features it supports server object migration between different Object 
Request Brokers (ORBs). IBOC returned an "UNKNOWN" result for the GIOP model 
and provided two counterexamples within a very reasonable runtime. One counterex- 
ample is the cycle where a GIOP client ORB forwards a user request message to the 
GIOP agent ORB. The execution of this cycle can cause unboundedness if there are an 
unbounded number of user requests. The other counterexample is the cycle where server 
objects register their migration from one ORB to another. If we allow migrations to hap- 
pen at any time, the system is flooded by an unbounded number of register messages. 
We eliminated these two sources of unboundedness from the GIOP system and applied 
IBOC again to the modified model. We obtained a result "BOUNDED" which indicates 
that the two counterexamples were indeed the only sources of unboundedness in the 
system. While some of the buffer bound estimates were larger than the ones assumed 
in [9], there were also some channels with smaller estimates. For instance, the size of 
the channel toServer in [9] is 3 while its estimate is 1. 



5 Related Work 



There is a long history of work on the handling of infinite communication buffers in 
automated system analysis. An overapproximation using the assumption that buffers may 
loose messages is proposed in [1]. Sufficient syntactic conditions for the nwboundedness 
of communication channels in CFSM systems are proposed in [8]. There is a history of 
checking properties of Petri-Nets using linear programming techniques (c.f. [ 1 1,5]) but 
these approaches do not encompass boundedness tests. We are not aware of any work 
prior to ours that addresses buffer capacity estimation for CFSM-type models. Various 
attempts have been made to define formal operational semantics for Promela [13,2,16]. 
Note that our analysis largely relies on the recognition of statically analyzable features 
of Promela models, such as the control flow cycles, and many of the semantic subtleties 
of Promela were abstracted away. As a consequence our work does not depend on the 
availability of a completely specified and unanimously agreed semantics definition for 
Promela. 
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6 Conclusion 

We presented an incomplete test for buffer overflows in Promela models as well as a 
conservative estimate for the maximal occupancy of Promela message channels. The 
experimental results we presented indicate that the analysis method scales well to prob- 
lems of realistic size. We also illustrated that the analysis produces useful results, in 
particular the maximal occupancy estimates help in finding smaller models. 

In comparison to the UML RT analysis presented in [10] the analysis of Promela 
code imposes more challenges on the employed code abstraction techniques. Due to 
its syntactically more constrained nature, the determination of message types and the 
identification of communication channels is not an issue. 

Current research focuses on the improvement of counterexample handling and the 
identification of sources of unboundedness. As we discussed above, IBOC attempts to 
point the user to potential sources of unboundedness. Determining the non-spuriousness 
of a counterexample and the ensuing abstraction refinement are currently entirely hand- 
craft, and we are currently working towards more automated support at this end. 

Acknowledgements. The third author was supported through the DFG funded project 
IMCOS (grant number LE 1342/1). We thank all involved students, in particular Quang 
Minh Bui, for their effort in developing IBOC. 
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Abstract. Nowadays, many distributed applications take advantage 
of the transparent distributed object systems provided by CORBA 
middlewares. While greatly reduce the design and coding effort, the 
distributed object systems may also introduce subtle faults into the 
applications, which on the other hand, complicate the validation of 
the applications. In this paper, we present our work on applying SPIN 
to check the correctness of the designs of CORBA-based applications, 
taking into account those characteristics of CORBA that are essential to 
the correctness of the applications. In doing so, we provide adaptations 
to UML, so that the CORBA-based applications can be modeled 
with succinct yet sufficient details of the underlying middlewares. An 
automated translation tool is developed to generate Promela models 
from such UML design models. The translation tool embeds the 
behavioral details of the middleware automatically. In this way, the 
software developers can stay in their comfort zone while design faults, 
including those caused by the underlying distributed object systems can 
be pinpointed through the verification or the simulation with SPIN. 

Keywords: Distributed Object Systems, Middleware, Formal Specifica- 
tion and Verification, Model Checking, UML, SPIN, CORBA. 



1 Introduction 

Nowadays, many distributed applications take advantage of middlewares. A 
middleware encapsulates the heterogeneity of the distributed computing envi- 
ronment and provides a consistent logic communication media for distributed 
applications. This greatly reduces the design and coding effort. Many general 
object-oriented distributed applications choose distributed object system (DOS) 
middlewares, where objects are made available by the middleware beyond pro- 
cess boundaries. To promote the interoperability of DOS middlewares, a widely 
accepted standard, CORBA, is established by OMG. Today, most DOS mid- 
dlewares are either CORBA-compatible or similar to CORBA on architecture 
level. 

However, distributed object systems may also introduce subtle faults into 
the applications. This is because application designers usually tend to overlook 
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the differences between a local object and a remote object made transparently 
available. For example, phenomena like the serialization or blocking of remote 
object invocations may occur due to various reasons such as the resource con- 
figuration of the remote objects or the synchronization mode of the invocation 
itself. Overlooking such phenomena may under certain circumstances affect the 
business logic of the applications. 

In this paper, we present our work on applying SPIN [6,7] to check the cor- 
rectness of the designs of CORBA-based applications, taking into account those 
characteristics of CORBA that are essential to the correctness of the applica- 
tions. 

Formal verification techniques, such as SPIN model checker, rely on formal 
specifications of the design documents. However, it is hard for ordinary software 
developers to grasp formal specification languages. One of the possible solu- 
tions is to define an automated translation from graphical, easy-to-use design 
models into the formal specification languages. For object-oriented applications, 
UML has been recognized as an industrial standard and is commonly used for 
software design. Here we assume that the design specifications of CORBA-base 
distributed applications are given in UML notations. As the UML notations 
stretch over almost all aspects of software artifacts, translating the entire do- 
main of UML notations is not feasible. We choose to adapt and translate an 
essential subset of UML class diagram, statechart diagram and deployment di- 
agram, which provide sufficient information about the dynamic behavior of the 
applications. 

When building a verification model for a CORBA-based application, the 
behavior of a remote object should not be over-simplified as that of a local 
object. On the other hand, as the verification process is always resource inten- 
sive, the verification model must be succinct and should not accommodate a 
complete middleware model. In this paper, we construct an abstracted CORBA 
middleware model in Promela, considering those and only those characteristics 
of CORBA that are essential to the correctness of the applications, namely: (i) 
binding: the connection of a client to a remote object; (ii) thread policy : the POA 
(Portable Object Adaptor) policy which governs the service of remote objects; 
(iii) synchronization mode: the mode for client-side remote method invocations. 

Accordingly, we made a few adaptations to UML notations, including (i) 
stereotypes for classes, methods and interfaces; (ii) pre-defined classes for client 
binding, remote method invocations and thread policies; and (iii) pre-defined 
component types for describing CORBA facilities (POAs and ORBs). These 
adaptations serve as the interface of the abstract CORBA middleware model. In 
this way, we preserve the clarity of the design models: The CORBA middleware 
remains a black-box in the design models. While generating the correspond- 
ing Promela model from such design models, our CORBA-middleware model is 
integrated automatically. 

We have developed a translation tool called CUP, which generates the cor- 
responding Promela source code from adapted UML diagrams. A concrete sys- 
tem is created from a UML deployment diagram using the elements defined in 
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UML class diagrams. The behavior of the elements is defined in UML stateclrart 
diagrams. The stateclrart diagrams in our design model are built upon parame- 
terized method calls, change events, guard conditions and value assignment ac- 
tions. Currently, CUP takes as its input only specially formatted XML file that 
represents the UML diagrams. The files are actually simplified standard XMI 
representation of the UML diagrams. We plan to use the XMI representations 
as the input of CUP in the future. 

To reduce verification complexity, we translate most of the methods into 
Promela inlines, which is the most efficient way to simulate method calls in 
Promela [6]. For this purpose, we require that the behavior of a class is defined 
on per-metlrod basis. Consequently, we eliminate all concurrency elements in a 
stateclrart diagram (sync, fork, join, concurrent regions in composite states). In- 
stead, the concurrency is achieved through some specially stereotyped methods, 
which are specified as running on its own execution thread. 

To further reduce the complexity, we define specially named composite states 
in stateclrart diagrams to identify atomically executed blocks. Thus, transitions 
dealing with only local variables can be organized into atomic execution blocks 
which eliminates some interleavings that do not affect the truthfulness of the 
correctness properties. 

We assume that readers have basic knowledge of Promela/ SPIN and UML. In 
Section 2, we give a brief introduction to parts of CORBA relevant to our work. 
A motivating example is described in Section 3. This example is used through 
out the paper to demonstrate the UML adaptation, translation and the verifi- 
cation results. Our adapted UML design model for CORBA-based applications 
is introduced in Section 4. In Section 5, we explain the generation of Promela 
code from the design model, especially the realization of the distributed object 
systems. Conclusion, related work and some final remarks are given at the end. 



2 A Brief Introduction to CORBA 

CORBA is a widely accepted standard for DOS middlewares established by 
OMG. A CORBA middleware makes objects in a distributed application acces- 
sible beyond process boundaries. The core component of the middleware is the 
object request brokers (ORB), which is responsible for finding a proper remote 
object to service a client (binding), communicating the method invocation re- 
quests to the remote objects and transferring the result back to the clients when 
it becomes available. An ORB may be single-threaded or multi-threaded. To be 
remotely accessible, a remote object must be managed by a POA object and 
registered to the naming space. The POA is the interface between the objects 
and the ORB. A POA manages remote objects, interprets the invocation re- 
quests made to the objects and activates proper servant threads to service the 
requests. How a POA performs is determined by its policies. The policy we are 
interested in is the thread policy, which specifies the thread model used for the 
servant threads: 
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— main-thread. All POAs in an ORB with this thread policy will share a servant 
thread provided by the ORB for all method invocations. 

— thread-per-POA. The POA will create one servant thread to handle all 
method invocations. 

— thread-pool(n). The POA will create a set of n servant threads to handle 
method invocations. 

— thread-per- object. The POA will create a servant thread for each remote 
object it manages. 

— thread-per- client. The POA will create a servant thread for method invoca- 
tions from each client of an object it manages. 

— thread-per-request. The POA will create a servant thread for each method 
invocation request. 

POA thread models can be used in either single-threaded or multi-threaded 
ORB. However, in a single-threaded ORB, the performance of any POA will be 
the same as that of a main-thread one. 

Since method invocations may share a thread in ORB or POA according to 
the thread policies, internal blocking is introduced to the application, which may 
cause deadlocks or request re-ordering. 

To find a remote object for service, a client must identify to the ORB, the 
remote object (s) it wishes to connect to and the ORB will locate the proper 
remote object for it accordingly. This process is called binding. Common binding 
options are: 

— Request to bind to a specific remote object. The object is identified by its 
unique identifier. 

— Request to bind to a remote object of a certain type. The type is identified 
by the remote interface of the object. 

— Request to bind to a remote object of a certain type in a certain POA. The 
POA is identified by its unique identifier. 

A remote method may be invoked in different synchronization modes. Typ- 
ically, the invocation is synchronous : the client blocks until the remote method 
call returns. When a method requires no return values, the invocation can be 
made asynchronously, the client will continue its execution as soon as the request 
is received on the other end, without waiting for its completion. If the method 
invocation involves lengthy computation, deferred synchronous call is useful. In 
this invocation mode, the client continues execution after issuing the request, 
and pulls the result of the remote method call sometime later. 

In the next section, we give a motivating example to show the design of 
distributed applications with CORBA middleware and the possible faults that 
may be introduced. 

3 A Motivating Example 

In an auction system, multiple clients (bidders) can compete for items. A bidder 
can withdraw from bidding at any time, and the bidding of an item terminates 
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when all bidders have withdrawn from the competition for that item, or when 
the pre-set price limit for that item is reached. When the bidding of an item 
terminates, whoever offered the highest bid first will succeed in the bidding. 
Suppose for fault-tolerance purpose, there are two servers running concurrently 
to service the bidders. Each of them maintains a local copy of the bidding status. 
A bidder is allowed to connect to any one of the servers, yet for fairness reason, 
only one connection is allowed for each bidder. The auction server will refuse 
a connection if another connection from the same bidder (determined by its 
identifier) already exists. 

Let us assume that such a system is designed using CORBA middleware in 
the following way: Two remote objects are used as auction servers. The bidding 
status is represented as replicated local data of these objects. The two auction 
servers synchronize with each other through remote method invocations to keep 
the bidding status consistent. When two servers try to update the bidding status 
simultaneously, a token is used to determine which server should update first. 
An active object is used to represent a bidder. Each bidder object connects to 
an auction server through a client that is bound randomly to an auction server. 
To make a bidding request, the client invokes a remote method on the auction 
server it is connected to. 

Below are some of the properties of the applications we are interested in and 
that can be verified with model checking tools: 

— The design should be deadlock- free. 

— The bidding process should eventually complete. 

— One and only one of the bidders will eventually succeed in bidding an item. 

— The successful bidder must hold a bid higher than or equal to the other 
bidders. 

— The successful bid should be no higher than the pre-set price limit. 

The correctness of such properties may vary depending on whether we model 
the middleware behavior or simply treat the remote object invocations similar 
as local calls. For example, an application according to the above design runs 
well under multi-threaded, ORBs and thread-per- client POAs but it may run into 
deadlock if we use thread-per- object POAs. In the latter case, suppose bidder b\ 
made a bidding request to auction server si and bidder 62 made a bidding request 
to auction server S2 simultaneously. Server sj tries to synchronize with server S2 
on behalf of bi : it makes a remote method invocation on auction server S2 for the 
update of the bidding status, while vice versa, server S2 tries to synchronize with 
server si. However, both these method invocations are blocked because the only 
servant threads on server object Si and S2 are already occupied by bidders bi and 
62 respectively when they made the method invocations on their respect server 
object. As bi and 6 2 will not release the servant threads of sq and s 2 respectively 
while they wait for the results of their bidding requests, the execution runs into 
a deadlock state. Such a deadlock situation cannot be identified if we model 
remote method invocations in the same way as local ones. 
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4 Adapted UML Diagrams 

In this section, we introduce our UML design model for CORBA-based applica- 
tions as the input of the CUP tool. 

The design model consists of a set of self-executable active objects, ORB and 
POA components and remote object components deployed into the POA com- 
ponents. Each object and component is assigned a unique and publicly known 
integer ID. We assume all objects and components are persistent. We do not con- 
sider the creation, de-activation and destruction of any active objects, remote 
object components, POA or ORB components. 

Correspondingly, the input model of the CUP tool consists of a UML deploy- 
ment diagram, which creates and deploys all object instances in the application, 
UML class diagrams and the UML state chart diagrams, which are for the spec- 
ification of the objects. Each method defined in the class diagram is associated 
with a statechart diagram to specify its behavior. The behavior of a class is 
specified by the collection of its method statechart diagrams. 

In the following, we introduce the syntax and the semantics of the adapted 
deployment diagrams, class diagrams and statechart diagrams. 

4.1 The Deployment Diagram 

The deployment diagram specifies the following aspects of the application: 

— The object and component instances and their unique identifiers; 

— The deployment relationship between remote objects and POAs, between 
POAs and ORBs, is used to specify the management relationship among 
them. 

— The thread policies for POAs and ORBs are specified by deploying a Policy 
object to a POA or an ORB. 

The deployment diagram contains two nodes: a CORBA node and a Clients 
node. 

The CORBA node forms the server-side of the application. It contains one or 
more ORB components. An ORB component contains a root POA component 
and possibly a Policy object. The latter defines the thread policy of the ORB. 
If omitted, the default thread policy is single-threaded. The root POA compo- 
nent contains zero or more remote object components and zero or more POA 
components. Each POA component also contains zero or more remote object 
components and zero or more POA components. A remote object is expressed 
as a component which implements a remote object interface. A POA component 
may contain a Policy object, which defines the thread policy of the POA. The 
thread policy of root POA is always main-thread. A POA without a Policy object 
inherits the thread policy of its parent POA. 

The Clients node contains a set of Process components. They form the client- 
side of the application. Each Process component contains an active object, which 
is self-executable and serves as the starting point of the execution. 
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Fig. 1. Deployment diagram of the online auction example 



Figure 1 shows the deployment diagram of the auction example, in which two 
bidders participate in the auction of an item. Here, the CORBA node contains 
two ORB components orbl and orb2, both specified as multi-threaded. In the 
root POA of orbl , we have POA component Poa2 whose thread policy is thread- 
per-client. With this policy, it manages a remote object of class AuctionServer 
with identifier 1. The Clients node contains two Process components with active 
objects 10 and 11 respectively of class Bidder. 



4.2 The Class Diagrams 

The class diagrams in our model contains active object classes, remote object 
interfaces and classes that implement remote object interfaces. To distinguish 
the specific roles of the classes in the applications, we define class stereotype Ac- 
tive for active object classes and interface stereotype IDL for remote interfaces. 
An active object is self-executable. To specify the starting point of the execu- 
tion, we define method stereotype Main. An Active class must contain one and 
only one Main-stereotyped method, which starts the execution. We also define 
a similar method stereotype called Thread. A Thread- stereotyped method is not 
self-executable. However, when it is called, it will execute on its own execution 
thread, running concurrently with its caller. A Thread- stereotyped method must 
have neither output nor return parameters. 

Figure 2 shows the class diagram for the auction example. Here, an instance 
of class Bidder is an active object, and its attribute serverC is a Stub. In the 
statechart diagrams, it will be bound to an Auction remote object and thus 
become a client of the object. Similarly, the attribute interServer in an Auction- 
Server object will become a client of the other Auction remote object for server 
sy nchroniz at ion . 
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Fig. 2. Class diagram in online auction example 



Note that these diagrams are for illustration only. Our CUP tool currently 
works only on text-based, specially formatted input. 

As shown in this figure, to keep the verification model manageable, we uni- 
form the parameters of the remote methods such that: (i) The method must 
have no return value; (ii) The method must take an R.M-In object as its input 
parameter and possibly an RM_Out object as its output parameter. As default, 
the data classes RM_In and RM_Out contain two integer parameters each. De- 
signers can override these classes if needed, but the parameters must be of type 
integer or its subtype. 

With the restriction, the interface to any remote method is statically de- 
termined. That, combined with the statically created remote objects, greatly 
simplifies the realization of the binding and the remote method invocation pro- 
cess. 

As shown in the figure above, two pre-defined CORBA-related classes, namely 
Stub and Response , are used to facilitate remote method invocations. A Stub 
object represents a client to a remote object. The Request object in CORBA, 
which is used for dynamic method invocation, is omitted in our model since 
the restrictions we put on the model reduced much of the dynamic method 
invocation tasks. For simplicity, the core of the dynamic method invocation, 
namely the deferred remote method calls, is packed into the Stub object. The 
Response class is for getting the result of deferred remote method calls. 

Figure 3 shows the details of the class definitions of the Stub and Response. 

The Stub class contains six methods: two binding methods, three remote call 
methods and an unbind method. 
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«CORBA» 

Stub 



bind(roi: string): bool 

bind(roi: string, option: string, value: string): bool 
send_sync(methodName: string, inp:RM_In, outp:RM_Out): void 
send_async(methodName: string, inp:RM_In): void 
send_deferred(methodName: string, inp:RM_In, response: Response): void 
unbindQ: void 



«CORBA» 

Response 

get_response(outp:RM_Out): void 
poll(): bool 



«CORBA» 

Policy 

thread_policy: string 



Fig. 3. The Stub and Response classes 



A Stub object becomes a client to a remote object through its binding meth- 
ods. The first parameter in both methods defines the type of the remote object 
to which the Stub object will be bound. Its value must be set as one of the 
IDL- stereotyped interface name. 

— The binding method with no binding option will randomly bind the Stub 
object to a remote object which implements that interface. 

— If the option parameter is given value POA, the method will bind the Stub 
object to a remote object component implementing the given interface in 
the specific POA component. The parameter value indicates the id of that 
POA. 

— If the option parameter is given value OBJ, the method will bind the Stub 
object to a specific remote object. The parameter value indicates the id of 
that object. The object must implement the given interface. 

After binding, the remote method calls to that object can be made through 
the Stub object. The remote methods can be called synchronously through the 
sendsync method, asynchronously through the send-async method or deferred 
synchronously through the send-deferred method. The parameter methodName 
in these methods specifies the remote method that should be called. The pa- 
rameters inp and outp contain input and output parameters respectively. The 
Response object is for getting result of deferred method calls. The result of the 
method invocation can be pulled at any time by calling the get-response method 
of the Response object. 

A Stub object can also be disconnected from the remote object by calling the 
unbind method. After disconnection, the bind methods can be called again to 
bind the object to a remote object. 

Another class shown on Figure 3 is the Policy class. Instances of this class 
define the thread policy for ORB or POA components. (See Figure 1). 

A Policy object contains a string attribute thread-policy, which defines the 
thread policy of a POA or ORB component. The legal values of thread-policy 
for POA are thread-per- client, thread-per-obj, thread-per-POA, thread-per-ORB, 
thread-pool(n) and thread-per-request. The legal values for ORB are single- 
threaded and multi-threaded. We assume a multi-tlrreaded ORB has unlimited 
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method clientJoin in AuctionServer 



[bidStatus!=closed&clientStatus[inP.pl]==IDLE]/clientStatus[inP.pl]=JOINEDg 



[else]/outP.p 1 =fail 






outP 1 =interC . send_sync(“dup”,inP.p 1 ) 



method joinBidding in Bidder 

/serverC .bind(“Auction”) ^ / (result, highBid)=serverC. send_sync(“clientJ oin’ ’ ,idlndex) 



*0 






[result==fail]/result=ffail 

[else] 



O 



method addBid in Bidder 



/selfBid=s 



^ ^ [selfBid==BID_LIMIT] ^ Q /a$sert(selffiid<=B ID_ LIMIT), 



lfBid+1 



[else] 



® 



Fig. 4. Partial statechart diagram in online auction example 



thread resource and will never block. We also do not consider the inter-ORB 
communication at this stage. 



4.3 The Statechart Diagrams 

As we mentioned in the Introduction, in the design model, a statechart diagram 
defines the behavior of a method. It is built upon parameterized method calls, 
change events, guard conditions and value assignment actions. Unlike in tradi- 
tional state diagrams, in UML statechart diagrams, a transition with a guard 
condition is executable only if the condition is true at the time the system leaves 
the starting state of the transition. Correspondingly, change event is defined 
in UML statechart diagrams: A transition with a change event is executable 
whenever the boolean condition used in the event becomes true. Both cases are 
reflected in CUP. 

In the statechart diagram of method joinBidding (the second statechart di- 
agram) in Figure 4, a serverC stub is bound to an arbitrary remote object of 
AuctionServer. With this binding, remote method clientJoin (see the first stat- 
echart diagram) is called in synchronous mode. 

We eliminate all concurrency elements in a statechart diagram (sync, 
fork, join, concurrent regions in composite states). Instead, the concurrency is 
achieved through calls to Thread - stereotyped methods when necessary. 
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To reduce the verification complexity of the model, we add special semantics 
for composite states named atomic : they identify atomically executed blocks. 
For example, in the stateclrart diagram of method addBid in Figure 4, since gen- 
erating a new bid is a sequence of local actions which do not involve any shared 
data, they are specified as an atomic composite state for an atomic sequence of 
actions. It reduces the verification complexity without any side-effect. 

For verification purpose, we add special semantics for states whose name 
starting with end, progress or accept. They will be translated into their corre- 
sponding Promela labels [6], which play important roles during the verification. 
We also allow users to insert Promela assertion statements in the transition ac- 
tions as correctness properties. For example, in the stateclrart diagram addBid 
in Figure 4, the assertion statement verifies if the new bid generated will be no 
greater than the pre-set price limit. 

Due to space limit, we will not discuss stateclrart diagrams in more details 
here (cf. [1]). 

5 Promela Model for CORBA Distribute Object Systems 

In the previous section, we have introduced our adaptation of UML for CORBA- 
based applications. In this section, we explain the corresponding Promela model. 
Basically, the Promela model is generated according to the following rules: 

— Each POA component is translated into a global channel which holds remote 
method call requests and a set of servant thread processes according to its 
thread policy. The servant threads repeatedly remove the request it should 
process from the POA channel, invoke proper method inline to service the 
call and send results (if any) through the return channel provided in the 
request back to the caller. 

— An ORB component with single-threaded policy is translated into a global 
channel which holds remote method call requests and a dispatcher process 
which repeatedly removes element from the channel and dispatches it to 
proper POA channel. 

— An ORB component with multi-threaded policy is omitted in the Promela 
model: The remote method call requests are sent directly to the POA chan- 
nels. 

— An active object is translated into a data attribute, a set of method inlines 
(each corresponding to a method) and a set of Promela process prototype 
(each corresponding to a Main or Thread-stereotyped method). A process of 
a Main-stereotyped method prototype will be activated by the init process 
of Promela. 

— A remote object component is translated into an attribute of a customized 
data type and a set of method inlines. One or more servant thread processes 
of its POA will be assigned to service remote method invocations to this 
object, using the data attributes and the method inlines. 

— The binding is extracted from ORBs and is realized by a sole Promela process 
called cup-bind. 
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— The data class RM In and R,M Out are translated into two customized 
Promela datatypes CUP-RM_In and CUP-RM_Out. 

— A Response object is translated into an attribute of data type CUP-Response, 
which contains a channel for retrieving the result of a remote method call. 

— A Stub object is translated into an attribute of data type CUP Stub, which 
contains two channels and two integers: Channel invocation is for sending 
remote method calls and channel response is for getting the result of a syn- 
chronized remote method call; Integer objID and stubID are for the binding 
purpose. 

Table 5 shows the definitions of CUP-Response and CUPStub. 



Table 1 . CUP data types for the Stub and the Response classses 



typedef CUP-Stub 

{ 

int stubID, objID; 

chan invocation ; / * for sending remote method calls * / 

/ * return channel : output Parameters * / 
chan response = [0] of {CU P JRM _Out}; 

} 

typedef CU P -Response / * for deferred method calls * / 

{ 

chan responseChan = [1] of { CUP-RM_Out }; /* outputParameters * / 

} 



The binding of a Stub object is translated into the proper initialization of 
the invocation channel and the objID attribute. We chose to use a sole Promela 
process instead of one for each ORB to handle all binding requests and each 
binding request is serviced by an atomic block. This puts less strain on the state 
space. Since the variables modified by the binding process is local to the stub or 
to the generated servant thread, such implementation is free from side-effect. 

All binding requests are sent through a global channel cup -bind-request. The 
request contains three mtype integers representing the unique id of the Stub 
object, the type of the remote object and the binding option (CUP -RANDOM, 
CUP-POA or CUP _OBJ). It also contains an integer for POA or remote object 
id. 

The binding process interprets the requests, finds a proper remote object 
according to the binding options and returns the proper invocation channel to- 
gether with the remote object id to another global channel cup -bind-result to be 
picked up by the requester. If the remote object is managed by a POA which 
belongs to a single-threaded ORB, the request channel of that ORB will be sent 
back. Otherwise, the request channel of the POA will be sent back. If the chosen 
object is managed by a thread-per- client policy POA, the binding process also 
starts a servant process to service that stub. 
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The following shows the definition of the binding channels and the translation 
of the binding for a Stub object stub: 

chan cupJ)ind .request — [0] of {mtype,mtype,mtype,int}\ 
chan cup -bind cresult. = [0] of {chan, int }; 

stub. bind(” Auction”) is translated into 

cup-bind-request\stub-T .stubl D , CUP-IDL-Auction, CUP-RANDOM, scratch ; 
cup -bind -request! stub-T. invocation , stub-T. obj I D ; 

Note that each variable name may be assigned with some suffixes during the 
translation. As such details are omitted here (cf. [1]), we use T to represent such 
possible suffix here for readability. 

A remote method call is translated into sending of a remote method call 
element to the invocation channel. For example, suppose an invocation request 
for method m with input parameter ip is made through a Stub object named 
s. If the method has an output parameter op, then UML synchronous remote 
method call 

s.sendsync(“m ”, ip, op) 
will be translated by CUP into 

S-T.invocation\s-T.stubID, CUP-SYNC, S-T.response, s-T.objID, 
CUP-METHOD.m, ip.T); 
stub-T .response! op-T ; 

The element contains the id for the stub object, the synchronization flag, the 
response channel, the remote object id, the remote method id and the input pa- 
rameter. Here, CUP-METHOD-m is the generated unique identifier for method 

m. 

Each remote method call element is extracted from the invocation channel by 
a servant process, interpreted and serviced. The functionality and the quantity 
of the servant threads are determined by the thread policy of POAs: 

— A servant process is created for each ORB which has at least one thread- 
per-ORB policy POA to service all calls made to the request channel of the 
ORB (to invoke a remote method managed by one of the POAs). 

— A set of servant processes is created for each POA that does not have a 
thread-per-ORB policy. They service specific calls made to the request chan- 
nel of the POA: 

• thread-per- client: A servant process is created for each stub object which 
is a client of a remote object managed by the POA to service the calls 
from the stub object. 

• thread-per- obj: A servant process is created for each remote object man- 
aged by the POA to service the calls to the object. 

• thread-per- POA: A servant process is created to service all the calls to 
the POA. 

• thread-pool(n): A set of n servant processes is created to service all the 
calls to the POA. 
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• thread-per-request: A servant process is created for each invocation re- 
quest. The process terminates after the call is completed. 

To service a call, a servant thread will first determine a proper inline to 
invoke, according to the method identifier and the remote object identifier sent 
as part of the request. Then the inline will be invoked with the input parameter 
from the request, and the output parameter, if any, will be updated in the inline 
and finally be sent back to the return channel provided in the request. 

Due to the limit of space, we will not discuss the generation of the inlines 
here. 



6 Related Work 

To boost the acceptance of formal verification techniques, various researchers 
have been exploiting ways to loosen the restriction that a system must be de- 
scribed formally for verification purpose. An alternative is to derive formal de- 
scriptions automatically from existing software artifacts, so that the software 
developers can get relief from learning a formal specification language and fur- 
ther writing correct specifications in it. Since UML is well accepted by the soft- 
ware industries as graphical design notations, people have been studying the 
formalization of UML descriptions. In particular, people have discussed various 
issues regarding the translation from formalized UML stateclrart diagrams into 
PROMELA [10,11,14,13], 

Our work also involves such a translation. Its details, which is out of the 
scope of the present paper, is reported in [1], The most prominent feature in our 
translation of UML stateclrart diagrams (as well as class diagrams and deploy- 
ment diagrams) is that our focus is application-oriented. While other UML — > 
Promela translation tools are available, ours specifically deals with the parame- 
terized remote and local method calls and exempt signal events. We allow rich 
transition syntax while put much restrictions on states, especially those related 
to the concurrency control. The result is an efficient translation that translates 
each stateclrart into a Promela process or a Promela inline. In this regard, our 
translation shares some commonality with the Java Pathfinder [5]. Our treat- 
ment greatly reduced the number of processes in the translated code. We also 
distinguish between change events and guard conditions, which is not addressed 
in other translators. Our focus has been put on introducing the realization of the 
CORBA distributed object system in Promela, which simplifies the verification 
task for distributed applications built on CORBA-based middlewares. 

The verification of distributed computing environment has also gained much 
of our attention. Various works have been conducted to verify the correctness 
of communication protocols on different layers. For example, the specifications 
of CORBA ORB (Object Request Broker) properties are discussed in [2]. The 
descriptions of CORBA GIOP (General Inter-ORB Protocol) in PROMELA 
can be found in [8]. The motivation behind these works is to check the cor- 
rectness of the protocols themselves. Our work, on the other hand, is targeted 
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at the correctness of the design models of the applications built on top of the 
CORBA middleware. In doing so, we have not only abstracted and formalized 
the CORBA-middleware but also embedded it automatically into the design 
notations during the translation of the adapted UML diagrams. 

There are two main middleware families: the DOS middlewares, and the 
message-oriented middleware. While we have reported our work for the former, 
one can also find related work for the publish-subscribe middleware, one of the 
major message-oriented middlewares [3]. In the work of [3], the authors used 
reusable, parameterized state machine models to define the behavior of a generic 
publish-subscribe middleware, and developed translation tool to transform it, 
together with the descriptions of the components of the applications, into a 
specific verification model for SMV. The middleware interfaces we used here are 
an object-based, generic form in UML to describe the reusable, parameterized 
state machine. 

A similar work that involves both the formalism of CORBA and its integra- 
tion into the design specifications of the applications can be found in [9]. There, 
the authors have presented a technique on modeling the behavior of CORBA 
ORB in Finite State Processes (FSP) [12] so that the related model checking 
tool Labeled Transition System Analyzer (LTSA) [12] can apply. To incorporate 
the modeling of the ORB into the specification of the design models in UML, 
the authors suggested to use the stereotypes on class diagrams to define the 
threading policy (single-threaded class or multithreaded class), and on method 
invocations in statechart diagrams to define the synchronization modes of the 
calls. In reality, however, the thread policy is defined on ORBs and POAs, and 
instances of the same class (of the application) may be managed by different 
POAs possibly with different thread policy. Here we have explicitly defined ded- 
icated components for POAs and ORBs, each with an attribute to specify the 
corresponding thread policy. 

There are quite some model-checking tools available off-the-shelf. We chose 
SPIN because it is a popular, mature and powerful tool. It allows us to benefit 
from a lot of important features it provides, e.g. on-the-fly verification, sup- 
port for both rendezvous and buffered message passing, efficient partial order 
reduction. Our focus is on the transparency and the efficiency of the translated 
model. For efficiency, our model is basically static and methods are implemented 
as PROMELA inlines to reduce resource consumption. If desired, CUP can be 
modified without difficulty to utilize the extensions of SPIN such as dSPIN [4], 
which may provide some additional features. 

7 Conclusion and Final Remark 

CUP is for those users who have little knowledge of formal verification techniques 
but desire to use them. Especially, we consider applying model checking tools to 
CORBA-basecl applications. 

The major issue of the tool is how to model the applications so that the 
chosen model checking tool can work effectively. Even with the effectiveness 




Translation from Adapted UML to Promela 



249 



of SPIN, we still need to make our best effort to reduce the complexity of the 
generated Promela model, because the difficulty comes from two dimensions: one 
is the complexity of the application logic, and another is from the realization of 
CORBA-basecl middleware. 

On the aspect of the application logic, we translated most statechart dia- 
grams into Promela inlines, for which we have to generate method invocation 
graph and create different suffix for labels, method names and variable names 
accordingly. We also added special semantics to composite states to identify 
atomically executed blocks. On the aspect of middleware modeling, we first sim- 
plified naming, binding, etc. The naming is omitted and binding request are 
serviced by a single process with each binding request modeled as an atomic 
block. Each servant thread is modeled as a Promela process. By minimizing 
the processes used for remote method calls, the complexity is reduced and the 
resulting model can remain manageable. 

We verified the Promela source file generated by current version of CUP for 
the auction application on a Toshiba Satellite notebook computer with Microsoft 
Windows XP, Pentium(R) 4 with CPU 3.06GHz speed and 512M memory. The 
verification to check deadlock on the model with two bidders passed in 160 
minutes in exhaustive checking, and less than 7 minutes in supertrace checking 
with over 99% coverage. We also verified some LTL formulas while polishing the 
auction application. We have identified errors in verifying: 

— One and only one of the bidders should finally succeed and the others should 
finally fail. 

#define ppp ( (Bidder [0]. result == fsuccjfe (Bidder [1] .result == ffail) 
#define qqq ( (Bidder [1] .result == fsuccjfe (Bidder [0]. result == ffail) 

LTL: <>(ppp— qqq) 
error: Both bidder fails. 

reason: Bidder may not bid at all before withdrawing, 
solution: Let bidder bid at least once before withdrawing. 

— The success bidder should hold the highest bid. 

#define rrl ( (Bidder [0] .result == fsucc) & (Bidder [0] . selfBid= = Auc- 

tion [0].highBid)) 

#define rr2 ( (Bidder [1] .result == fsucc)& (Bidder [l].selfBid== Auc- 

tion [0].highBid)) 

LTL: <>(rrl|rr2) 

error: the final bid is higher than the high.Bid in Auction Server. 
reason: error in auction server synchronization, 
solution: change made in the dup method. 

Compared with the simplified version of our work where middleware is not 
modeled and the remote objects are treated as access-free local objects, CUP 
has apparently added some overhead to the execution cost to handle each client 
thread and each remote method call issued by a client thread. This overhead is 
determined by many factors. Intuitively, the impact of the overhead on handling 
the client thread decreases with the increase of the workload of the client thread, 




250 



J. Chen and H. Cui 



and the impact of the overhead on handling the remote method call header 
increases with the frequency of the remote calls: If the clients makes frequent 
yet short remote method calls, it will cause a great deal of overhead. 

We have run the auction example with two bidders under the thread-per- 
client mode, and with different pre-set price limits 2, 3, 4, 5, 6. See below: 

Price Limit= 6 — > Depth= 628 States= 5.5e+007 Transitions = 7.23034e+007 
Price Limit = 5 — > Depth= 586 States= 3.1e+007 Transitions = 4.0888e+007 
Price Limit = 4 — > Depth= 570 States= 1.4e+007 Transitions = 1.86401e+007 
Price Limit = 3 — » Depth= 566 States= 5e+006 Transitions = 6.79471e+006 
Price Limit = 2 — » Depth= 531 States= le+006 Transitions— 1.3985e+006 

The workload of the client threads and the number of remote method calls in each client 
thread increase with the pre-set price limit. From the above data, we can see that the 
impact of the thread overhead (je+006) is negligible when the price limit increases. On 
the other hand, the overhead on the remote method calls remains constant and cannot 
be ignored. 

The state explosion problem due to the increased complexity of the application 
itself is unavoidable in CUP: typically, the number of states will increase exponentially 
with the number of active servant threads. For example, if three bidders instead of 
two compete for an item with price limit 3, the number of state increases rapidly to 
3.5e+008. 
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Abstract. The notion that certain procedures are atomic provides a 
valuable partial specification for many multithreaded software systems. 
Several existing tools verify atomicity by showing that every interleaved 
execution reduces to an equivalent serial execution (in which the ac- 
tions of each atomic procedure are not interleaved with actions of other 
threads). However, experiments with these tools have highlighted a num- 
ber of interesting procedures that, although atomic, are not reducible. 
This paper presents a more complete technique for verifying atomicity. 
Essentially, this technique explores non-serial and serial executions of the 
multithreaded system simultaneously to ensure that every non-serial ex- 
ecution yields the same final state as the corresponding serial execution. 
Using the SPIN model checker, we have applied this technique to verify 
the atomicity of a number of irreducible procedures that could not be 
handled by previous reduction-based tools for checking atomicity. 



1 Multithreading and Atomicity 

The development and validation of multithreaded software systems is an impor- 
tant yet challenging problem. In particular, standard techniques such as testing 
and manual code inspection are often inadequate for multithreaded systems, due 
to the large number of possible thread interleavings. Model checking provides 
a promising technique for ensuring that a system’s implementation satisfies its 
specification under all possible thread interleavings. 

A prerequisite of model checking is developing an appropriate specification. 
For many interesting software systems, writing a sufficiently-complete specifi- 
cation is non-trivial. As an example, consider the filesystem procedure create, 
which creates a new file. A specification of the exact effect of create on the 
concrete filesystem state would be quite verbose. Alternatively, we could specify 
the behavior of create on an abstraction of the filesystem state, but we would 
then need an abstraction invariant relating concrete and abstract states, and 
such abstraction invariants are also quite complex. 

* This work was partly supported by the NSF under Grant CCR-03411797 and by 
faculty research funds granted by the University of California at Santa Cruz. 
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For many multithreaded procedures such as create, the notion of atomicity 
provides a lightweight yet valuable partial specification. Informally, a procedure 
is atomic if for every (arbitrarily-interleaved) program execution, there is an 
equivalent execution with the same overall behavior where the atomic procedure 
is executed serially, that is, the procedure’s execution is not interleaved with ac- 
tions of other threads. This atomicity guarantee reduces the challenging problem 
of reasoning about the procedure’s behavior in a multithreaded context to the 
simpler problem of reasoning about the procedure’s sequential behavior. The lat- 
ter problem is significantly more amenable to standard techniques such as testing 
and manual code inspection. In addition, many programming errors associated 
with improper synchronization can be detected as atomicity violations. 

We formalize this notion of atomicity by modeling multithreaded program 
execution as a transition system and using two transition relations. The standard 
transition relation — > interleaves steps of the various threads in an arbitrary 
manner. The serial transition relation K > also interleaves steps of the various 
threads, provided no thread is executing an atomic procedure. Once a thread 
enters an atomic procedure, then the serial transition relation executes that 
procedure to completion, without interleaved steps of other threads. 

Reasoning about program behavior is much easier under the serial semantics 
(i-*-) than under the standard semantics (—>■), since each atomic block can be 
understood sequentially, without the need to consider all possible interleaved 
actions of concurrent threads. However, standard language implementations 
only provide the standard semantics (— >), which admits additional transition 
sequences and behaviors, and a program that behaves correctly according to 
the serial semantics may still behave erroneously under the standard semantics. 
Thus, in addition to being correct with respect to the serial semantics, the pro- 
gram should also use sufficient synchronization to ensure the atomicity of each 
block of code that is intended to be atomic. That is, for any program execution 
(Jo — >* cr from the initial state <Jo (where, for simplicity, we assume no thread 
is executing an atomic block in a), there should exist an equivalent serial exe- 
cution do i-A* (J. We call this the atomicity requirement on program executions, 
and correctly synchronized programs should satisfy this requirement. 

Over the past year, a number of tools have been developed for verifying 
this atomicity requirement, using techniques such as theorem proving [11], static 
typing systems [9,10], dynamic analysis [8,23], and model checking [13]. All these 
approaches are based on reduction, either Lipton’s theory of reduction [16] or 
partial order reduction [21]. 

Reduction suffices to verify the atomicity of many procedures with straight- 
forward synchronization, but is often inadequate for procedures that use more 
subtle synchronization. This paper introduces commit- atomicity, which is a more 
general technique for verifying atomicity. This technique is based on exploring 
serial and non-serial executions of the program simultaneously, and checking 
that both executions yield the same final state. Commit-atomicity is capable of 
verifying the atomicity of many procedures that cannot be handled by existing 
atomicity-checking tools based on reduction. 
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The presentation of our results proceeds as follows. The following section 
reviews reduction and provides an illustration of limitations of that technique. 
Section 3 introduces a semantics for multithreaded programs that we use as the 
basis for our formal development. Section 4 describes our technique for verifying 
commit-atomicity during model checking. Section 5 provides an evaluation of this 
technique using the SPIN model checker on four benchmark programs. Section 6 
discusses related work, and we conclude with Section 7. 



2 Reduction 

The essential idea behind reduction is to transform an interleaved (non-serial) 
execution of an atomic procedure into a serial execution of that procedure by 
commuting adjacent actions of concurrent threads. For example, consider the 
first execution trace in the diagram below, in which one thread executes a pro- 
cedure that (1) acquires a lock m, (2) reads a variable x protected by that lock, 
(3) updates x, and then (4) releases m. The execution of this procedure is in- 
terleaved with some actions bi, ^3 of a second thread, which do not access x. 
Hence, the read and write of x by the first thread commute with the operations 
of the second thread. In addition, the acquire operation right-commutes and the 
release operation left-commutes with the operations of the second thread, as il- 
lustrated by the following diagram. Hence, via reduction, we obtain an equivalent 
serial execution with the same final state in which the actions of the procedure 
are not interleaved with operations of other threads. Thus, reduction suffices to 
prove that the first execution trace is serializable, that is, it has an equivalent 
serial trace. If every execution trace through the procedure is serializable, we say 
the procedure is atomic. 

Reduction example 

i 1 

acq(m) b\ rd(x, 0) 62 wr(x, 1) 63 reZ(ra) 

(To " <7 1 ~ <7 2 73 > (74 7 - 0"5 " <T 6~ a 7 

P ^ P 

acq(m) rd(x, 0) wr(x, 1) reZ(ra) 62 63 

(7o ^ (7 1 ^ (72 <73 ^ <74 ^ <75 ^ (7 q ^ (77 

I I 

Reduction suffices to verify the atomicity of many procedures that follow 
straightforward synchronization disciplines. However, during our experiments 
with atomicity-checking tools based on reduction, we repeatedly encountered 
procedures that, although atomic, are not reducible. As an example, the proce- 
dure acquire shown on the next page uses a combination of busy-waiting and a 
compare-and-swap (CAS) operation to acquire a mutually-exclusive lock m (rep- 
resented as a boolean). The operation CAS(m,false,r) has no effect and returns 
false if m false. However, if m = false, then the operation CAS (m, false, r) 
swaps m and r and returns true. 
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Procedures acquire and do_transaction 



void acquire () { 

boolean r := true; 
while (r==true) { 
CAS(m,f alse ,r) ; 

} 

} 



void do .transact ion () { 
while (true) { 

acquire (mutex) ; 
int t := data; 
release (mutex) ; 



// long computation 
int fdata := f(t); 



acquire (mutex) ; 
if (t==data) { 

data := fdata; 
release (mutex) ; 
return ; 

} 

release (mutex) ; 

} 

} 






i i 



A non-serial execution of acquire is shown in column (a) below, in which 
the acquire operation performed by thread T1 is interleaved with an operation of 
thread T2 that resets m to false. This execution reduces to the serial execution of 
column (b), since the operation of T2 can commute to the start of the execution 
sequence, before acquire begins. 


Executions of acquire 




T1 : acquire () begins 

T1 : r := true 

T1 : assume r == true 

T2: m := false 

T1 : CAS(m, false, r) ok 

T1 : assume r != true 

T1 : acquire () ends 


T2: m := false 

T1 : acquire () begins 

Tl: r := true 

T1 : assume r == true 

Tl : CAS(m, false, r) ok 

Tl : assume r != true 

Tl : acquire () ends 


1 

Tl : acquire () begins 
Tl: r := true 
Tl : assume r == true 
Tl: CAS(m, false ,r) fails 
Tl: assume r == true 
T2: m := false 
Tl: CAS(m, false ,r) ok 
Tl: assume r != true 
Tl: acquire () ends 


(a) Non-serial execution 


(b) Serial execution 


(c) Non-serial execution 

1 



Column (c) shows an alternative non-serial execution of acquire in which 
the CAS operation initially fails, and the busy-waiting loop iterates until the CAS 
operation succeeds. Note that since the execution of column (c) contains more 
instructions in column (b), we clearly cannot commute the execution of column 
(c) into the serial execution of column (b). Thus, even though the execution 
of column (c) is equivalent to the serial execution of column (b), in the sense 
that both executions yield the same final state, reduction is inadequate to verify 
this equivalence. Thus, the procedure acquire is atomic yet not reducible, and 
current atomicity checking tools based purely on reduction (using either type 
systems [9,10], dynamic analysis [8,23], or model checking [13]) cannot verify 
the atomicity of acquire. The Calvin- R tool [11] uses a combination of iter- 
ated abstraction and reduction to verify such atomicity properties, but requires 
additional programmer annotations to “guide” the right abstraction. Commit- 
atomicity is intended to verify such atomicity properties automatically. 
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Another example of an atomic yet irreducible procedure is the procedure 
do_transaction shown above. In this example, the global variable data is pro- 
tected by mutex, and the procedure do_transaction updates data according to 
data := f (data). However, the calculation of f (data) requires a long compu- 
tation. To avoid holding mutex while computing f (data), the procedure acquires 
mutex, reads data, and releases mutex. The procedure then computes f (data), 
and, if data has not changed, updates data with f (data) . If data has changed, 
then the transaction is retried. 

The procedure do_transaction is atomic, in the sense that each execution 
is equivalent to some serial execution. However, in every serial execution, the 
procedure returns during the first iteration of the loop, but there are many 
non-serial executions where the loop iterates many times. Thus, these non-serial 
executions cannot reduce to the equivalently-behaved serial executions, and so 
reduction is again inadequate to verify the atomicity of do_transaction. In 
contrast, commit-atomicity is capable of verifying the atomicity of both of these 
irreducible procedures. 



3 Multithreaded Programs 

To provide a formal basis for reasoning about atomicity, we start by formalizing 
an execution semantics for multithreaded programs. In this semantics, a multi- 
threaded program consists of a number of concurrently executing threads, each 
of which has an associated thread identifier i £ Tid. The threads communicate 
through a shared store a £ Store, and system execution starts in an initial store 
do- The exact structure of the store is left unspecified as it is orthogonal to our 
development. The behavior of each thread i is specified by a partial function 

Tj : Store -e-> Store 

which performs a single step of that thread. 



3.1 Standard Semantics 

The standard semantics of the entire multithreaded program is defined as a non- 
deterministic interleaving of steps of the various threads. The transition relation 
<t — } cj f performs a single step of an arbitrarily chosen thread. We use — to 
denote the reflexive-transitive closure of — >. 

Standard semantics: a — > a' 

* a —t n' iff 3i C Tid. Ti(a.a’) * 

I I 



3.2 Serial Semantics 

We assume that each thread in the multithreaded program contains a number 
of atomic blocks, and that each atomic block has a particular commit point 
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where, from the perspective of other threads, the entire block appears to happen 
atomically. We assume that for each thread i the function 

Ai : Store — > { Outside , PreCommit , PostCommit } 
indicates the phase of thread i, that is, whether thread i is 

1. outside an atomic block ( Outside ); 

2. in the pre-commit phase of atomic block ( PreCommit ); 

3. or in the post-commit phase of atomic block (PostCommit). 

This phase information might be determined by examining the program counter 
of thread i recorded in the store. We require that no atomic block is active in the 
initial state; that the phase of one thread is not affected by step of a different 
thread; and that each thread cannot directly transition from the post-commit 
phase of one atomic block to the pre-commit phase of a subsequent atomic 
block (it must have an intermediate state that is outside any atomic block). We 
formalize these requirements as follows: 

— Aj(cro) = Outside for all i € Tid. 

— if Ti(a, a') then Vj ^ i. Aj(c r) = Aj(a'). 

— if Tj(cr, a') then Aj(cr) ^ PostCommit or Ai(cr') ^ PreCommit. 

The relation A(er) holds if any thread is inside an atomic block: 

A(a) = 3 i £ Tid. A.;(cr) ^ Outside 

The following serial transition relation H > is similar to the standard relation 
— h except that a thread cannot perform a step if another thread is inside an 
atomic block. Thus, the serial relation i — > does not interleave the execution of an 
atomic block with instructions of concurrent threads. 

Serial semantics: a H >• a' 

i ; 1 

<7 > cr iff G Tid. (Ti(cr, a ) A \/j ^ i. Aj(cr) = Outside ) 

I I 

Reasoning about program behavior is much easier under the serial semantics 
(e^) than under the standard semantics (— >) that is provided by standard lan- 
guage implementations. However, a program that behaves correctly according 
to the serial semantics may still behave erroneously under the standard seman- 
tics. Thus, in addition to being correct under the serial semantics, the program 
should also use sufficient synchronization to ensure the atomicity of each block of 
code that is intended to be atomic. That is, for any program execution cto — >* a 
where ^A(ct), there should exist an equivalent serial execution oo H >* a. We call 
this the atomicity requirement on program executions, and correctly synchro- 
nized programs should satisfy this requirement. (The restriction ^A(ct) avoids 
consideration of partially-executed atomic blocks.) 
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4 Model Checking Commit- Atomicity 

In this section, we present an instrumented semantics that detects violations of 
the atomicity requirement described above. The instrumented semantics only ad- 
mits execution sequences that are serializable, and goes wrong on non-serializable 
sequences. To determine whether a given execution sequence is serializable, the 
instrumented semantics extends the state space with a shadow store p £ State. 
Program operations in the pre-commit or post-commit phase of an atomic block 
operate as expected on the normal store cr, and do not affect the shadow store p. 
However, when an atomic block commits, the entire atomic block is executed in 
a serial manner on the shadow store. Thus, the shadow store reflects the serial 
execution of all commited atomic blocks. The shadow store is used to verify the 
serializability of the given execution sequence. 

The instrumented transition relation (a, p) => {a' , p') is defined below. If 
no atomic block is executing on the shadow store (that is, ->A(p)), then the 
instrumented semantics performs a step of an arbitrary thread on the normal 
store. If this step is in the pre-commit or post-commit phase of an atomic block, 
then no action is performed on the shadow store, via the rules [pre-commit] and 
[post-commit]. However, if the step is the commit action of an atomic block, 
then the serial execution of that atomic block on the shadow store is initiated 
via the rule [commit] . As expected, commit actions include transitions from the 
PreCommit to PostCommit phase of an atomic block. However, commit actions 
also include: 

— transitions from Outside to PostCommit , where an action enters and imme- 
diately commits an atomic block; 

— transitions from PreCommit to Outside , where an action commits and im- 
mediately exits an atomic block; and 

— transitions from Outside to Outside, where the “atomic block” only contains 
a single action. 



Instrumented semantics: (a, p) => ( a' , p ') and (cr, p) => wrong 

i 



[pre-commit] 



[post-commit] 



^M(p) ^Mp) 

Ti ((7, <x') 7j(<r, <t') 

Ai(c r') — PreCommit Ai(cr) — PostCommit 

(<T, p) =► O'.P) O. p) =► O'.P) 



[commit] 

--A(p) 

Ti(cr, a') 

Ai(cr) £ { PreCommit , Outside} 
Ai(o ' ) £ { PostCommit , Outside} 

T(p.p') 

0> p) => O' , p') 



[shadow] [wrong] 

Ai(p) £ {PreCommit, PostCommit} ->«4(cr) —<A(p) 

Tj(p,p ) er ^ p 

(cr, p) =>• (cr, p') (cr, p) => wrong 

I I 

Once the execution of an atomic block on the shadow store is initiated via 
[commit], then the execution of that atomic block continues in a serial (non- 
interleaved) manner via the rule [shadow] until it completes. Thus, in any reach- 
able instrumented state (cr, p), the shadow store reflects the serial execution of all 
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commited atomic blocks. If no thread is currently inside an atomic block (that 
is, -v4(cr)), then we expect that the operations on the shadow store are a serial- 
ization of the operations on the normal store, and hence that a — p.li the serial 
execution on the shadow store and the interleaved execution on the normal store 
yield different results (that is, a ^ p), then we cannot verify that the execution 
sequence is serializable, and the instrumented execution goes wrong via the rule 
[wrong]. 

Since atomic blocks are executed in a serial manner on the shadow store, two 
threads should never be simultaneously executing atomic blocks on the shadow 
store. Hence, we say the shadow store p is well-formed if 

Vi, j £ Tid.(Ai(p) ^ Outsider Aj(p) ^ Outside => i = j) 

The following lemma states that the instrumented semantics performs a sequence 
of interleaved operations on the normal store, and a sequence of serial operations 
on the shadow store. 

Lemma 1. If ( a,p ) =>* (a',p') and p is well-formed then a — >* a ' and p H >* p' 
and p' is well-formed. 

Proof We prove the case where (a, p) => (cr', p') via a single transition by case 
analysis on the transition rule used. This proof generalises to longer transition 
sequences by induction. 

— [pre-COMMIt] or [post-commit]: Since Tj(cr, cr'), we have a — > a'. In addi- 
tion, since p' = p, we trivially have p K >* p’ and p' is well-formed. 

— [commit]: Since T^cr, cr'), we have cr a'. In addition, since -o4(p), we 
have Aj(p) = Outside for all j £ Tid. Together with Ti(p,p'), we then have 
p i-)- p'. Also from T,(p, p'), we have Aj(p') = Outside for all j ^ i, so p' is 
we 11- formed. 

— [shadow] Since Ai{p) ^ Outside and p is well-formed, we have Aj{p) = 

Outside for all j ^ i. Together with Tj(p, p'), we then have p i— >■ p' and that 
p is well-formed. Finally, since cr' = cr, we have a — >■* a'. □ 

In addition, the instrumented semantics includes all evaluation sequences 
possible under the standard semantics, except that the instrumented semantics 
records additional information in the shadow store. We assume that all atomic 
blocks terminate, that is, if A, ( pi ) ^ Outside then there exists p 2 , ■ ■ ■ , p n such 
that Ai(p n ) = Outside and Ti(pk, Pk+i) for all 0 < k < n. 

Lemma 2. If a -A* a' and atomic blocks terminate then for all well-formed p 
there exists p' such that (a, p) =>* (a',p'). 

Proof We first prove the base case where a — >■ cr' via a single transition 
because Tif cr, o'). The proof then generalises to longer transition sequences via 
induction. 

— Suppose -o4(p) and Ai(cr') = PreCommit. In this case (cr, p) i-A ( cr',p ) via 
[pre-commit]. 
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— Similarly, suppose ~^A(p) and Ai(cr) = PostCommit. In this case (d, p) i-A 
(<j',p) via [post-commit]. 

— Suppose -’A(p) and neither of the above cases hold. That is, Ai(a’) ^ 
PreCommit and Ai(cr) ^ PreCommit. Then ( a,p ) i-A (d ',p') via [commit], 
where Ti(p,p r ). 

— Suppose A(p). Then there exists i € Tid such that Ai(p) ^ Outside. Since 

atomic blocks terminate, then there exists p2,...,p n such that A l (p n ) = 
Outside and T)(d, pf) and for all 1 < k < n, we have Ai(pk) ^ Outside and 
Ti(pk,Pk+ 1)- Hence, ( a,p ) i-A* (a,p n ) via a sequence of [shadow] transitions 
to a state {cr,p n ) where ~^A(p n ), and one of the above cases then applies to 
this state. □ 

Finally, any instrumented execution that does not go wrong satisfies the 
atomicity requirement. 

Theorem 1. If a — >* a’ and -o4(d) and -iA(cr') and atomic blocks terminate, 
then either 

1. cr i-A* a' , or 

2. (a, cr) =>* wrong. 

Proof Since a — >* a ', by Lemma 2 there exists p' such that (a, cr) =>* (cr', p'). 
Since atomic blocks terminate, there exists p" such that (a ' ,p') =>* (cr ' , p") and 
->A(p"). If a' ^ p" then (cr' , p") => wrong via [wrong], yielding case 2 of this 
theorem. Otherwise o' = p" and cr i— >* a' by Lemma 1, yielding case 1 of this 
theorem. □ 



Thus, given any standard execution cr 0 — >* a' (where ^>4(cro) and — >^4.(cr / ) 
and do is well- formed) , we can inspect the corresponding instrumented execution 
(do, do) => (d', p'), which must exist by Lemma 2. If this instrumented execution 
does not go wrong, then by Theorem 1, we know that the original execution 
do — >* cr' is equivalent to some serial execution do !->•* cr' . Thus, by using model 
checking to ensure that no instrumented execution goes wrong, we can therefore 
verify that the program satisfies the atomicity requirement. 

5 Evaluation 

We have applied commit-atomicity to verify several example programs that could 
not be handled by earlier atomicity-checking tools based on reduction. This 
section presents the example programs we used and reports on the performance 
of our verification technique. 



5.1 Busy- Waiting Lock Acquire 

Our first benchmark uses the busy-waiting lock acquire function described in 
Section 2. This benchmark contains an integer variable data protected by the 
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mutex m. The code for each thread contains a loop that first acquires the mutex, 
updates data, and then releases the mutex. Our correctness specification is that 
each iteration of the loop should appear to execute atomically (and hence two 
threads should never update data at the same time). This correctness of speci- 
fication is included in the code via the construct atomic { . . . }. We consider 
two version of this benchmark in order to calibrate the ability of our technique 
to handle large procedures. In the first version acquirel, the critical section 
only contains a single line of code, whereas in the second version acquire2, the 
critical section contains 100 lines of code that manipulate data. 

Busy-waiting lock acquire benchmark 

i 1 

Code for each thread: 

while (true) { 
atomic { 

acquire (); // see impl in Section 2 

// critical section 
II read-modi fy-write data 

m := false; 

} 

} 



Variables: 

boolean m; 
int data; 

Initially: 

m := false; 
data := 0 ; 



5.2 Dekker’s Mutual Exclusion Algorithm 



Our second example is Dekker’s algorithm, a classic algorithm for mutual exclu- 
sion between two threads that uses subtle synchronization. The critical section 
of each thread updates a shared variable data. Our correctness specification is 
that, because the mutual exclusion code is correct, the body of the while loop of 
each thread should appear to execute atomically. This specification is expressed 
using the construct atomic { ... }. 



Dekker’s mutual exclusion benchmark 



Variables: 



boolean ai ; 
boolean a2 ; 
int data; 

Initially: 

ai := false; 
a2 := false; 
data := 0 ; 



Threadq : 

while (true) { 
atomic { 
ai := true; 
if (-> a 2 ) { 

// critical section 
II read, write data 

} 

ai := false; 

} 



Thread2 : 

while (true) { 
atomic { 
a2 : = true ; 
(-ai) { 



if 



} 

a2 



// critical section 
II read and write data 



:= false; 



5.3 Transaction Retry 

Our third benchmark re-uses the procedure dovtransaction from Section 2, 
with the requirement that each transaction should be performed atomically. 
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Transaction benchmark 

i : 



Variables: 



Code for each thread: 



boolean mutex; 
int data; 

Initially: 

mutex := false; 
data := 0; 



while (true) { 
atomic { 

do_transaction() ; 

} 

} 



5.4 Bluetooth Device Driver 

The Bluetooth benchmark is a simplified model of one of the bluetooth de- 
vice drivers in Windows NT described in [22] . There are two dispatch functions 
in this simplified device driver: BCSP_PnpAdd and BCSP_PnpStop. The function 
BCSP_PnpAdd is called by the operating system to perform I/O in the driver. 
The second dispatch function BCSP_PnpStop is called by the operating system to 
stop the driver. In our benchmark, one thread calls BCSP_PnpStop, and all the 
remaining threads call BCSP_PnpAdd. 

Our correctness specification is that each dispatch function should execute 
atomically. In particular, each call to BCSP_PnpAdd should either operate nor- 
mally or return immediately because the device driver is already stopped. 

5.5 Experimental Results 

We tested each benchmark using various numbers of concurrent threads, as 
shown in Figure 1. For each of the five benchmarks, we manually generated 
two Promela programs that capture the semantics of the benchmarks under the 
standard semantics (— >) and instrumented semantics (=>), respectively. Figure 1 
compares the cost of model checking these benchmarks under these two seman- 
tics. For each benchmark/threads/semantics configuration, the figure reports the 
size of the reachable state space and the memory and time required for model 
checking. An entry of indicates that the SPIN model checker ran out of mem- 
ory on that configuration. We performed these experiments under Windows XP 
on a 1.7GHz Pentium M laptop with 1GB of memory. 

For each variable x in the original program, we declared two variables x 
and s_x in the Promela code for the instrumented semantics, to represent the 
value of x in the normal store and shadow store, respectively. Thus, the size of 
each state in the Promela code for the instrumented semantics is twice as large 
as for the standard semantics. In addition to this increase in the size of each 
state, the experimental results in Figure 1 indicate that the size of the reachable 
state space for the instrumented semantics is significantly larger than for the 
standard semantics. That is, the overhead of atomicity checking contributes to 
the state explosion problem on these benchmarks. However, commit-atomicity 
does provide a means of verifying atomicity in these benchmarks, which could 
not be accomplished with previous reduction-based tools. In addition, our results 
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Standard semantics (— >) 


Instrumented semantics (=>) 


Benchmark 


Threads 


states 


space (MB) 


time (s) 


states 


space (MB) 


time (s) 


dekker 


2 


3104 


1.7 


0.02 


3601 


1.8 


0.05 


acquire 1 


2 


135 


1.6 


0.02 


278 


1.6 


0.02 


acquire 1 


3 


468 


1.6 


0.02 


1795 


1.7 


0.03 


acquirel 


4 


4361 


1.9 


0.05 


20935 


3.4 


0.16 


acquirel 


5 


16369 


6.0 


0.15 


118242 


16.9 


0.99 


acquirel 


6 


62806 


11.5 


0.58 


658038 


113.4 


7.21 


acquirel 


7 


299952 


70.8 


4.42 


- 


- 


- 


acquire2 


2 


3335 


1.7 


0.03 


8278 


2.0 


0.04 


acquire2 


3 


12864 


2.4 


0.09 


58795 


5.4 


0.24 


acquire2 


4 


153854 


45.6 


1.32 


714359 


96.7 


4.91 


acquire2 


5 


541601 


85.3 


5.97 


- 


- 


- 


transaction 


2 


836 


1.6 


0.02 


4730 


1.9 


0.05 


transaction 


3 


25557 


6.0 


0.11 


532457 


78.1 


2.21 


transaction 


4 


826627 


99.3 


4.68 


- 


- 


- 


bluetooth 


2 


91 


1.6 


0.02 


116 


1.6 


0.02 


bluetooth 


3 


568 


1.6 


0.02 


1187 


1.6 


0.03 


bluetooth 


4 


4762 


1.9 


0.05 


16383 


3.1 


0.09 


bluetooth 


5 


47163 


5.2 


0.13 


271111 


33.1 


1.46 


bluetooth 


6 


527668 


48.6 


1.79 


- 


- 


- 



Fig. 1. Summary of benchmark programs and model checking performance. 



for the acquire2 benchmark indicate that this technique is capable of handling 
moderately-large procedures (in this case containing 100 lines of code). 

During our experiments, the bluetooth benchmark initially went wrong under 
the instrumented semantics, revealing the same synchronization bug that was 
discovered in [22] via an assertion violation. After fixing this bug, none of the 
benchmarks went wrong under the instrumented semantics, indicating that all 
these programs satisfy their intended atomicity properties. 

6 Related Work 

Lipton [16] first proposed reduction as a way to reason about concurrent pro- 
grams without considering all possible interleavings. Although he focused pri- 
marily on checking deadlock freedom, reduction has subsequently been extended 
to support proofs of general safety and liveness properties [6,3,15,4,19]. 

Reduction has been applied to verify atomicity in a static type system for 
Java programs [10,9]. This type system for atomicity was inspired by the Calvin- 
R [11] static checking tool for multithreaded programs, which relates each pro- 
cedure’s specification to its implementation via a combination of simulation and 
reduction. The Atomizer is a dynamic analysis tool for detecting atomicity vio- 
lations by running an instrumented version of the program [8]. In recent work, 
Wang and Stoller [23] also developed several algorithms for checking atomicity 
dynamically. The use of model checking for verifying atomicity is being explored 
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by Hatcliff et al [13], and they present two approaches, based on Lipton’s theory 
of reduction and partial order reductions, respectively. Their experimental results 
suggest that verifying atomicity via model-checking is feasible for unit-testing. 
All of these approaches can only verify the atomicity of reducible procedures, 
and thus are insufficient for the examples considered in this paper. 

Atomicity is a semantic correctness condition for multithreaded software. It 
is related to strict serializability [20] , a correctness condition for database trans- 
actions, and linearizability [14], a correctness condition for concurrent objects. 
It is possible that techniques for verifying atomicity can be leveraged to develop 
lightweight checking tools for related correctness conditions. 

Many other researchers have proposed using atomicity as a language primi- 
tive, essentially implementing the serial semantics H>. Lomet [18] first proposed 
the use of atomic blocks for synchronization. The Argus [17] and Avalon [7] 
projects developed language support for implementing atomic objects. Persistent 
languages [1,2] attempt to augment atomicity with data persistence in order to 
introduce transactions into programming languages. A more recent approach to 
supporting atomicity uses lightweight transactions implemented in the run-time 
system [12]. An alternative is to generate synchronization code automatically 
from high-level specifications [5] . 

7 Conclusion 

In an effort to avoid errors due to unexpected interactions between concurrent 
threads, programmers often design procedures that are intended to be atomic. 
Reduction suffices to verify the atomicity of procedures that use straightfor- 
ward synchronization, but is often inadequate for more subtle synchronization 
disciplines. 

This paper introduces a novel technique called commit-atomicity for verifying 
atomicity in multithreaded programs. This technique is based on executing serial 
and non-serial versions of the programs simultaneously, and checking that both 
versions yield the same final state. This technique is capable of verifying atom- 
icity of variety of procedures, including procedures that could not be handled 
using existing atomicity-checking tools based on reduction. 

Commit-atomiciy does introduce a significant model checking overhead. An 
important area for future research is the development of hybrid atomicity- 
checking tools that use reduction to verify many procedures, but is capable 
of leveraging commit-atomicity as necessary to verify procedures that use more 
complicated synchronization disciplines. 
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Abstract. As software systems become increasingly complex, there is growing 
interest in the use of formal techniques to obtain higher assurance in their 
correctness. The most commonly used tools involve model-checking, such as 
SMV and Spin. But modeling complex systems with a high degree of fidelity 
implies exceedingly large state spaces that must be analyzed. These state spaces 
are typically too large for single processing nodes, in spite of great advances in 
memory reduction techniques. Moreover, approximation techniques such as 
hash compaction are less well-received where safety-critical systems are 
concerned. Effective distribution of the problem over many processing nodes 
has the potential of supporting the huge state spaces. Since our primary interest 
is in safety-critical software, we have spent considerable time evaluating the 
performance of distributed implementations of Spin in this context. In this 
paper, we present our analysis of PSPIN, a distributed implementation of Spin. 
We identify key measures of effectiveness, and evaluate PSPIN with respect to 
these measures. We also present an alternative approach to partitioning that 
performs comparably with respect to all measures, and is up to orders of 
magnitude faster. Finally, we consider the question of which measures have the 
greatest impact on peak memory, a measure that is most critical to effective 
distribution. 



1 Introduction 

Software applications used to assist the flight of aircraft have long been considered 
safety-critical, and as such have been required to be developed using strict processes 
to maintain the high standards of the end products. These processes rely on reviews, 
simulations and testing to achieve those high standards. A number of factors have 
resulted in Formal Methods tools being seriously considered as part of this mix. The 
first of these reasons is the increasing functionality being demanded from the 
software, resulting in a steady increase in complexity. The second is the pressure to 
bring the product to market as quickly as possible. Third is the growing realization 
that existing processes are inadequate. It is not possible to test some requirements 
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explicitly (requirements such as all threads are guaranteed their budgets by the 
operating system). It is also not possible to identify and test all comer cases in 
complex systems - especially in systems that involve parallelism or distribution. 
Finally, there is growing data that early and effective use of formalisms in the 
development process reduces the chances of major rework late in the development 
process. 

Among the formal analysis techniques available, we have found model checking, 
and in particular, Spin[7], to be the most amenable to integration with safety-critical 
software development processes. As with other model-checking tools, Spin’s verifier 
can automatically analyze models against requirements. In addition, we have found 
Spin’s modeling language, PROMELA, to be easiest to translate to from the C++ 
code being used in our studies. Finally, Spin has been used extensively and is highly 
trusted, and it can check for a wide range of properties. 

We have successfully used Spin to analyze various embedded software in the 
past[4]. Our current work with the Deos™ real-time operating system[8] has pushed 
Spin to its limits. In particular, as features were added to the model, the memory 
requirements for full verification far exceeded the 4GB memory limit imposed by the 
32-bit Spin implementation. This was in spite of using Spin’s memory reduction 
techniques such as state-vector compression and partial order reduction, and also 
predicate abstractions. 

One solution to deal with the memory problems is to look at distributing the 
problem over many processing nodes. In theory, parallelization should cut the 
computation time as well as the amount of memory per node. However, it is believed 
[12] that due to fine grained communication requirements, distributed model- 
checking is inherently unscalable. Therefore, parallelization actually results in 
significant performance slow down. Thus, distribution of Spin is most effective when 
the memory overhead of distribution is low, and the time overhead is not prohibitively 
large. 

In this paper, we evaluate the performance of a distributed implementation of Spin, 
namely PSPIN[12]. This is particularly interesting as it is implemented as a wrapper 
around the traditional Spin, thereby reducing the probability of errors in the 
verification. In particular, we analyze the memory and time performance of this 
implementation. We identify measures of effectiveness for this category of distributed 
verifiers. We present an alternative scheme that partitions the state space according to 
the value of automatically selected elements of the state vector. We provide results 
that show that the new scheme performs comparably with respect to all these 
measures while over 25 times better with respect to verification time. Finally, we 
consider the issue of memory overhead for distribution and identify potential 
optimization measures that have the most effect on this. 

This paper is organized as follows. In the next section, we provide an overview of 
the Deos real-time operating system, and discuss briefly our approach to modeling it. 
Next, we analyze the performance of PSPIN, and identify the key measures by which 
we can judge the effectiveness of distribution. Then, we present our approach to 
partitioning and compare it with PSPIN’s partitioning schemes. We conclude the 
paper with related work and directions for further study. 
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2 The Deos RTOS and Its Modeling 

The Deos real-time operating system was developed by Honeywell for use in the 
Primus Epic avionics suite for business, regional, and commuter jet aircraft. Deos 
hosts many safety-critical applications in these aircraft, including primary flight 
controls, autopilots, and displays. 

Deos is a microkernel-based real-time operating system that supports flexible 
Integrated Modular Avionics applications by providing both space partitioning at the 
process level and time partitioning at the thread level. Space partitioning ensures that 
no process can modify the memory of another process without authorization, while 
time partitioning ensures that a thread’s access to its CPU time budget cannot be 
impaired by the actions of any other thread. Deos supports many advanced features 
such as dynamic creation and deletion of processes and threads, reuse of unused 
thread budgets (also known as slack time), aperiodic interrupts, synchronization 
mechanisms such as mutexes, etc. 

Deos is an interesting problem for this study for many reasons. First, analysis of 
source code is an unobtrusive way for formal analysis tools to gain acceptance within 
existing development processes. Second, Deos has a number of interesting properties 
that are well-suited for formal analysis. These include: (1) Time partitioning, which is 
a property that is practically impossible to test; and (2) Function pre-conditions, 
inserted in the code as comments that need to be checked at every invocation of the 
function. Finally, the Deos model has an atypical asymmetric model with one very 
large process and many smaller processes, which is a more challenging distribution 
problem. 

One requirement for our approach was the ability to automatically generate 
verification models, since it was expected that engineers with minimal training in 
formal analysis techniques would be the users. Therefore, though our Deos model was 
constructed by hand, automated translation was a key consideration in creating the 
model. This resulted in a model with a direct mapping (almost line-to-line 
correspondence) to the source code, which had the added benefit of being easier to 
review by the developers. 

The Deos model consists of three parts - the kernel, user threads and the 
environment. The kernel corresponds to the code translated from C++. It provides 
services to the user threads and interfaces with the hardware environment. It is the 
most complex part of the model. The user threads are very simple and just invoke 
various calls to the kernel and responds to messages (such as preemption) from the 
kernel. The environment provides the hardware services such as timers and interrupts. 
Though the environment is not as complex as the kernel, it is more difficult to model 
realistic hardware behavior in pure software. In our tests, we have found an event- 
based simulation-like approach to be the most suitable for modeling the environment. 
The Deos model initially consisted of the basic rate-monotonic scheduling algorithm 
with support for multiple threads and periods. Various features were incrementally 
added to the base model, including slack time, asynchronous interrupts, mutexes and 
overhead accounting. 

The model incorporates a number of memory optimizations. These optimizations 
fall into two categories - those that reduce the size of the state vector, and those that 
reduce the number of distinct states seen by the verifier. State vector size is reduced 
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by replacing all integers with bytes, reusing temporary variables, and by judicious use 
of hidden variables. The size of the state space is reduced by the use of predicate 
abstractions where possible, and judicious use of configurations (number of threads, 
their budgets, whether the threads are slack-enabled, etc.). 

Our initial models could be exhaustively verified by Spin in 335 MB of memory. 
But by the time slack and interrupt threads were added, the state space became too 
large to be exhaustively verified in 4GB of memory. Though we continued to use 
Spin’s approximation techniques such as bit-state hashing as debugging aids, these 
techniques are a harder sell for incorporation in the development processes of safety- 
critical systems. This led to our interest in other tools that can handle these large state 
spaces, and in particular, PSPIN. 



3 PSPIN Overview 

PSPIN is a wrapper around Spin that distributes the memory used for verification over 
many processing nodes (typically workstations in a network). The overall state space 
is partitioned into as many state sub-domains as the number of network nodes. Each 
node is assigned a different state sub-domain, and holds only the states that belong to 
that subset of the state space. During the verification run, each node computes the 
successors of the states it holds, and if it finds any successors belonging to other state 
sub-domains, it sends them to the nodes that are in charge of processing them. Since 
this is implemented as a wrapper around Spin, it is compatible with some of Spin’s 
memory reduction techniques such as state compression. Its compatibility with partial 
order reduction is discussed later in this paper. 

There are a number of challenges associated with partitioning vast and unknown 
state spaces: 

1. It is not efficient to construct a structural model of the entire state space. 
Therefore, all schemes must utilize a predictive approach. The use of such an 
approach can impact guarantees of optimality that may characterize particular 
partitioning schemes. (Although it is still possible to provide guarantees of 
optimality for a sample of the state space.) Furthermore, it is often not possible to 
guarantee communications and load balance bounds for predictive schemes. 

2. Since the partitioning function is called frequently (every time a child state is 
examined), processing nodes should be able to compute the owner node of a 
particular state quickly using purely local information. At the very least, the 
partitioning scheme must require non-local information infrequently. 

3. As the state space is larger than the available memory of a single processing node, 
it is not feasible for each processing node to maintain an array that maps every 
state to its assigned node. (We refer to such an array as a partition array.) Instead, 
it is necessary to encode and decode the partitioning using some technique whose 
memory requirement is much less than the size of the state space and that is 
relatively fast. It is typically the case that not every possible partition array is 
representable by a particular encoding technique. Therefore, the reachable 
partitioning space is effectively reduced by such methods. This effect impacts 
optimality, as optimal partitionings may not reside within the reachable solution 
space. 
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PSPIN implements three partitioning functions: Global Hash (GHP), Local Hash 
(LHP), and a graph partitioning-based method that we refer to as Source Code 
Partitioning (SCP). These address the above challenges in various ways. The Global 
Hash partitioning function maps states to processing nodes using a computation that 
involves the whole state vector. For example, the computation could be as simple as 
adding the values of all the variables in the state vector and computing its modulo 
with the number of processing nodes. Figure 1 illustrates this scheme as well as the 
Local Hash scheme. Here, a state vector v of size m is divided into n sub-vectors, one 
for each process in the model. Figure 1 shows the resulting processing node 
computation for this state vector using the Global Hash scheme. (Note that the 
computation is incomplete, as some of the state vector is not shown.) Global Hash 
requires only local information (i.e., the state vector). It can be shown to balance the 
load with high probability given a few reasonable assumptions about the distribution 
of the values of the state vector. However, this partitioning approach displays little or 
no locality with respect to how the states will be partitioned. That is, the probability 
that any two states that are adjacent in the state space will be mapped to the same 
processing node approaches zero as the number of nodes approaches infinity. 



State Vector v 




Process 1, Process 2, 



'T 

PC w 
2 1 




Process n 



Global_Hash(v) = (315 + missing values) / n 
Local_Hash(v, 1 ) = 308 / n 
Local_Hash(v, 2) = 4 / n 
Local_Hash(v, n) = 3 / n 



Fig. 1 . Illustration of Global Hash and Local Hash partitioning schemes 

Local Hash performs a similar computation to Global Hash, but utilizes only the 
values of a portion of the state vector belonging to a single Spin process, p. Figure 1 
shows the resulting processing node computations for the state vector v when p is set to 
1, 2, and n. Local Hash requires only local information. It increases locality compared 
to Global Hash, as any two adjacent states are guaranteed to be mapped to the same 
processing node if PSPIN is not currently executing the selected process p. So given a 
model consisting of processes of roughly equal sizes, the probability that any two 
adjacent states will be mapped to the same processing node approaches one as the 
number processes approaches infinity. However, as the portion of the state vector that 
is associated with process p is small compared to the entire state vector, the modulus 
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scheme is less likely to result in good load balance. Furthermore, as the number of 
processes increases, the Local Hash scheme will tend to obtain worse load balancing 
results than the Global Hash scheme. 

The third partitioning scheme available in PSPIN, Source Code Partitioning, is 
based on graph partitioning. In this scheme, the graph representing the source code for 
a selected process p is partitioned (and not a graph modeling the state space or a sample 
of the state space). Figures 2 and 3 illustrate this approach. Figure 2 shows a Promela 
model consisting of two processes, init and PI. 



int sum; 

proctype Pl() 

1 

int i = 0; 

int evensum = 0; 

do 

1: :: sum < 7 -> 

do 

2: i< 10 -> 

3: if 

4: :: (i % 2 == 0) -> evensum =evensum + i; 

i++; 

5: :: i % 2 !=0 -> i++; 

fi 

6: :: else -> 

break; 
od; 

7: sum++; 

8: :: else -> 

break; 
od; 

9: printf("sum=%d\tsum= %d\n", sum. evensum); 

10 : } 

init { 
sum = 0; 
run Pl(); 

1 



Fig. 2. Sample code to illustrate Source Code Partitioning 

Assume that the selected process is PI. (Note that using the init process would 
result in a partitioning with extremely poor load balance.) Figure 3(a) illustrates the 
source code graph for P 1 . This graph is constructed by PSPIN and models the possible 
control flows for the model. The circled numbers next to each vertex refer to the 
relevant line of source code from Figure 2. PSPIN can optionally weight the vertices 
and edges of this graph. It does so by performing a prerun of Spin. Each time a 
forward transition is taken, the weight of the associated edge on the graph is 
incremented. Similarly for vertices, each time a line of source code is executed by 
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Spin, the weight of the corresponding vertex is incremented. Figure 3(a) includes edge 
weights, but for the sake of clarity, vertex weights are not shown. A simple method for 
computing the vertex weights is to add up the weights of all outgoing or incoming arcs. 

Figure 3(b) shows the abstract graph that is passed to the Metis Graph Partitioning 
Package [10]. Metis partitions this graph into a number of sub-domains equal to the 
number of processing nodes. (In this example, we have three processing nodes.) 
Metis returns a partition array similar to the one shown in Figure 3(b). This partition 
array is compiled into a lookup table for the PSPIN partition function. The input to 
this partition function is the value of the program counter (PC) element from the state 
vector for the selected process (PI). In Figure 3(b), when the value of the PC is 6, 7, 
9, or 10, the state maps to node 0. When the value is 1,2, or 8, the state maps to node 
1. PC values of 3, 4, and 5 map to node 2. 




Fig. 3. (a) Source code graph for pi, and (b) the abstract graph that is passed to Metis 



The key idea behind Source Code partitioning is that the structure of the state space 
can be predicted given the flow control structure of one of the processes. By 
partitioning the source code graph to minimize the total weight of the control flow 
edges that are cut by the partitioning (as Metis does), the resulting partitioning should 
demonstrate good locality. Furthermore, by applying vertex weights that correspond 
to the relative frequency that source code lines are executed, the state space can be 
load balanced. That is, source code lines that are likely to be executed frequently (for 
instance, the code within a nested loop), will be likely to result in proportionally more 
states in the state space than less frequently executed code. So taking this information 
into account when partitioning the source-code graph can result in good load balance 
for PSPIN. Finally, the mapping of a state to its owner node can be computed using 
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purely local information if the Metis partition array is duplicated on every node. In 
this case, the value of the appropriate PC in the state vector is used as the index to this 
local copy of the partition array. 



3.1 PSPIN Evaluation with Small Deos Model 

The results from our initial evaluation of PSPIN’ s three partitioning schemes on a 
model of Deos over 2 and 4 processors is shown in Table 1. Note that GHP refers to 
Global Hash partitioning, LHP refers to Local Hash partitioning and SCP refers to 
Source Code partitioning schemes. Also note that the memory values are those 
reported by Spin at the end of the verification run, with the values in bold face being 
the aggregate sum from all the individual processing nodes. The time is the user time 
reported by the time command. Finally, all these results were obtained with partial 
order reduction disabled. The following points are evident from the results: 

1. GHP results in even distribution of state space (memory) over all the processing 
nodes. But the verification time (run time) for GHP is much higher than the other 
two partitioning schemes, and also serial Spin. Moreover, the rate of increase in 
verification time with the increase in number of processing nodes is far too great 
for this approach to be practical. This is expected, as there is no locality in the 
partitioning scheme, 

2. LHP on two nodes resulted in an approximately even distribution of states 
(memory) among the two nodes. The amount of memory used per node was less 
than half the memory used by serial Spin, and the run time was approximately the 
same as serial Spin. However, with 4 nodes, the results were more like running the 
test on two nodes, with one node not processing any states, and another node 
seeing very few states. Most of the work was done by two nodes. 

3. SCP over two nodes provided results that were comparable to those using local 
hash partitioning. But, over four nodes, two of the nodes did not perform any work. 

It is to be noted that the above results were obtained with partial order reduction 
(POR) turned off. This is significant, as we found in our tests that turning POR on 
results in inconsistent performance of distributed verification. With POR turned on, 
the sum total of states seen by all the nodes is typically larger than that seen by serial 
Spin. Traditional wisdom [12] has been that this is due to the fact that POR is less 
effective when distributed over many nodes. But when we compared the states seen 
by the distributed PSPIN with those seen by serial Spin, we found that while the 
distributed version saw some additional states, it also missed some states seen by 
serial Spin. This is a big problem, as it now becomes more difficult to validate the 
correctness of the distributed implementation. We are studying this issue further. For 
the purposes of this paper, all the results henceforth are with POR turned off. 

It is clear that both local hash and source code partitioning have better time and 
memory performance as compared to global hash partitioning. But both techniques 
were not making effective uses of all available nodes. Repeated tests with increasing 
prerun times for source code partitioning did not result in better use of all available 
nodes. Further study indicated that the problem was with the choice of Spin process 
used by the partitioning algorithm, which must be specified by the user. 
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Table 1 . Comparitive performance of PSPIN's partitioing schemes 





Partition 

Function 


Memory 


Messages Sent 


States Stored 


Computation 

Time 


Serial 


none 


24.077 




477722 


17 seconds 


2 Nodes 


SCP 


20.703 


17824 


476577 


13 seconds 


Node 1 




10.607 


13468 


201431 




Node 2 




10.095 


4356 


275146 




4 Nodes 


SCP 


25.738 


17824 


476577 


10 seconds 


Node 1 




10.607 


13468 


201431 




Node 2 




2.518 


0 


0 




Node 3 




2.518 


0 


0 




Node 4 




10.095 


4356 


275146 




2 Nodes 


LHP 


20.703 


17824 


476577 


14 seconds 


Node 1 




10.607 


13468 


201431 




Node 2 




10.095 


4356 


275146 




4 Nodes 


LHP 


26.659 


88368 


476577 


12 seconds 


Node 1 




10.607 


13468 


201431 




Node 2 




7.638 


4356 


169646 




Node 3 




2.518 


0 


0 




Node 4 




5.897 


70544 


105500 




2 Nodes 


GHP 


48.864 


3125460 


1467660 


251 seconds 


Node 1 




24.432 


1562940 


733953 




Node 2 




24.432 


1562520 


733712 




4 Nodes 


GHP 


66.495 


5139780 


1665980 


2835 seconds 


Node 1 




16.649 


1283630 


416332 




Node 2 




16.547 


1283730 


416197 




Node 3 




16.649 


1287480 


417253 




Node 4 




16.649 


1284940 


416202 





Our Promela model consists of one large process (corresponding to the Deos kernel) 
and a number of smaller processes. However, the default value for selecting the 
process whose source code graph is partitioned (variable NPROC in files table, t 
and ppan.c) corresponded to one of the smaller processes. We set the process 
selection variable (NPROC in SCP, HASHPROC in LHP) to correspond to the large 
Deos process (the Deos kernel) and the results were much better (as reflected in 
results shown later in this paper), with all the processors being well utilized. 



3.2 PSPIN Evaluation with Large Deos Model 

We next analyzed larger models, albeit models that could be verified using serial Spin 
within 4GB of memory, using PSPIN. We found that some of these runs did not 
complete, and had to be killed, for no apparent reason. Detailed analysis of reasons 
for PSPIN’s behavior pointed to issues of memory usage. PSPIN code was then 
modified to incorporate counters that kept track of memory allocation and 
deallocation within the Spin and PSPIN parts of the code. In addition, the new code 
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also kept track of the corresponding maximum values ( peak memory ) reached by the 
memory counters for both Spin and PSPIN during the verification runs. Experiments 
with this updated implementation indicated that actual memory usage, at some point 
in the verification run, far exceeded the amounts being reported by Spin, thus 
explaining why PSPIN failed on some large Deos models. 

The results from one such experiment are presented in Table 2. Verification using 
serial Spin of the model used in this experiment consumed 1 100.946 MB of memory. 
This experiment was performed on a four processor (P0, PI, P2, and P3) shared 
memory LINUX computer with 16GB of total physical memory. The total amount of 
memory used by all four processors to distribute the memory requirements of SPIN 
was 1414 Mbytes. The amounts of memory consumed by each processor as reported 
by PSPIN are listed in the column labeled Total Memory Reported, rows one through 
four. Note that the values in this column match the values of peak memory consumed 
by a processor for the SPIN verification (column Peak Memory SPIN). So we can 
conclude that the reported memory usage only consists of SPIN -allocated memory 
(and not PSPIN-allocated memory). The column labeled Peak Memory PSPIN is the 
peak memory consumed by a processor as a result of overhead from PSPIN memory 
allocations. It does not include SPIN memory. 



Table 2. PSPIN on four processors, with detailed tracking of memory usage. 



Proc- 

essors 


Total Memory 
Reported (MB) 


Peak Memory 
Spin (MB) 


Peak Memory 
PSPIN (MB) 


Script Results from 
Top 


P0 


302.19 


302.19 


50.44 


~ 328 MB 


PI 


322.37 


322.37 


1,883.87 


~ 3 GB 


P2 


407.15 


407.15 


115.48 


~ 464 MB 


P3 


382.99 


382.99 


49.28 


- 400 MB 



Table 2 shows that one processor (PI) requires approximately 30% more memory 
than serial SPIN. To confirm this observation, we wrote a simple UNIX shell scripts 
that recorded results from the UNIX system command “top” every 5 seconds while 
the above experiment was conducted. The results from the shell script, described in 
column 5, shows that processor one (PI) requires the largest amount of memory by 
far. The unreported PSPIN memory usage is due to communication buffers and 
explains why at times PSPIN does not complete on large DEOS models. 

The notion of peak memory is a critical measure of effectiveness of any distributed 
verification scheme. If the actual memory consumed during a verification run is more 
than the available physical memory, it does not matter if the final consumption would 
be less than the physical memory, as verification would definitely fail. Our analyses 
show that the predominant component of this memory used by PSPIN is 
communication overhead. 



4 State Space Partitioning 

The Source Code Partitioning scheme tries to elicit the structure of the state space 
given the structure of the flow control graph as well as by weighting this graph by 
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traversing a sample of the state space. A related approach is to elicit the structure of 
the state space by examining a sample of this space and constructing a graph, not of 
the flow control, but of the state space directly. That is, during a preran, a graph can 
be constructed that represents states as vertices and the transitions between states as 
the edges of a graph. Then this graph can be partitioned directly by an off-the-shelf 
graph-partitioning package. 



4.1 Weighting Vertices 

Weights can be applied to vertices by a number of methods. Typically a single 
weight is assigned to each vertex. However, a single weight does not allow you to 
model both the memory and computation associated with a single state. You can only 
model one or the other. It is possible to model both memory and computation if each 
vertex is given two weights. That is, each vertex of the graph is assigned a weight 
vector of size two. The first element of this vector represents the memory 
requirement of the state and the second represents the work requirement of the state. 
Every vertex that corresponds to an examined state is given a weight vector of (1, 0) 
that indicates the state requires one unit of memory to store, but no further work is 
associated with this state. Every vertex that corresponds to an open state is given a 
weight vector of (1, 1) that indicates it has both memory and work requirements. A 
multi-constraint graph partitioner [11] can be used to partition such a graph. (Note 
that the Metis package also implements a number of multi-constraint graph 
partitioning algorithms.) 



4.2 Generalized State Vector Element Partitioning 

The Source Code Partitioning scheme essentially partitions the state space based upon 
the PC for a selected process p. We can automate this approach to some extent by 
computing a partitioning for each Spin process, and then automatically selecting the 
best partitioning to use. Hence, the user does not need to select a process manually. 
Indeed, in our experiments, we have found that it is often the case that certain 
processes can result in extremely bad partitionings. 

We can further extend and generalize this scheme by allowing the state space to be 
partitioned based upon any element of the state vector — and not just the various PCs. 
A simple algorithm is to compute a partitioning for every element of the state vector 
and to select the state vector element and partition array of the best partitioning seen. 
We have implemented an algorithm that does so and refer to it as State Vector 
Element Partitioning (SVEP). 

The SVEP algorithm requires a prerun, during which the graph that models the 
state space is constructed. At this time, it is necessary to record not only the 
connectivity among the states in the state space, but also the state vector for each state 
visited. The prerun terminates after a pre-selected depth is reached. After the 
augmented state-space graph is constructed, SVEP applies the following algorithm in 
a recursive bisection manner [2] to result in a /c-way partitioning. 
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For each element of the state vector: 

1 . The range of values r that has been seen for the element is determined. 

2. If the r is less than or equal to a specified threshold t, then the graph is collapsed 
into r vertices. If r is greater than f, then the values are grouped together into t 
evenly sized sets, and the graph is collapsed into t vertices. 

3. The graph is collapsed by creating a single super vertex for each distinct value (or 
set of values in the case in which r is greater than t) seen. The weight of each 
super vertex is equal to the sum of the weights of its component vertices. 

4. For each edge e in the initial (i.e., uncollapsed) graph that connects two vertices 
that are mapped to different super vertices, either (i) a super edge is added to the 
collapsed graph, or (ii) if a super edge already exists between the two super 
vertices, then the weight of e is added to the weight of the super edge. 

5. All edges of the initial graph that connect vertices that are mapped to the same 
super vertex are ignored. 

6. The resulting collapsed graph is likely to be small. If it contains less than 15 
vertices, a two-way partitioning is computed optimally by brute force. (That is, 
every possible partitioning is examined, and the best one is returned.) If the 
collapsed graph is larger than 15 vertices. Metis is called to partition it in half. 

The best partitioning that is found by this process is returned along with identification 

information (e.g., the index of the associated element and its range information). The 

returned information is used to generate a partition function. 




Fig. 4. (a) An example of an augmented state-space graph, and (b) the resulting collapsed graph 
with respect to the value of variable y 

Figures 4 and 5 illustrate this process. Figure 4(a) shows an augmented state space 
graph. In this example, the state vector consists of three elements that correspond to 
Promela variables x, y, and z. (Note, that for purposes of this simple example we do 
not include process PCs in the state vector. However, any or all of these three 
variables could be PCs and the results would be the same.) Vertices are shaded if 
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their corresponding states have been examined. White vertices correspond to open 
states. The range of seen values for each element is shown at the top of the figure. 
For example, variable z has taken the values 1, 2, 3, 4 and nan (i.e., undefined) in the 
sampled state space. The graph shows each state encountered along with its unique 
state vector and the connectivity among the states. For example, state (1, 0, 4) can 
reach or be reached by states (0, 0, nan), (1.0, 3), and (1, -1, 3). 

Figure 4(b) shows how the augmented graph is collapsed with respect to variable y. 
(Note that since only two values are seen for variable x, the resulting graph has only 
two vertices and the partitioning can be performed trivially.) In this case, the range of 
three values that are seen for y result in a collapsed graph with three vertices. The 
optimal partitioning of this graph is shown. This partitioning has an edge-cut of nine 
and state balances of 1.2 and 1.0. (Balances are computed by max sub-domain weight 
over average weight.) The partition function that is derived from this partition array 
is shown. In this case, the partitioning results in a better edge-cut and the same 
balance compared to the partitioning that was trivially computed for variable x. 
Hence, this partitioning is saved as the current best. 

Figure 5 shows the graph collapsed with respect to variable z. A range of five 
values results in a collapsed graph with five nodes. The optimal partitioning is shown 
for the collapsed graph. This partitioning has an edge-cut of eight (same as the 
current best) and balances of 1.07 and 1.3. Depending on which vertex weight is 
favored, either of these partitionings could be selected as the current best. We give 
more weight to the balance that corresponds to the open vertices, as this approach will 
favor balancing the predicted work among the processing nodes to balancing the 
memory load. Therefore, the current best partitioning is that of variable y. 




Fig. 5. The augmented state-space graph collapsed with respect to the value of variable z 

This algorithm is applied in a general recursive bisection manner [2]. That is, for a 
k - way partitioning, after the initial bisector of the state space is determined, the 
augmented state-space graph is split into two graphs using this bisector. Then the 
algorithm is applied recursively to both sub-graphs. The automatically generated 
partition function then is structured as a set of nested if-then-else statements. This 
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scheme can handle values of k that are not powers of two by splitting the graph, not 
exactly in half by weight, but into one-third and two-third weight regions. The one- 
third side is not split further, while the two-third side is split further. 

Sometimes a variable might only take a single value throughout the entire sample 
space. In this case, it is not possible to compute a partition using this variable. Also, 
it is possible that more values might be eventually encountered in the full state space 
than are seen during the prerun. Therefore, the partition function must be generated 
to be complete. In our implementation, we cover these possible values by a simple 
scheme. (That is, all values less than those seen are mapped to node zero. All values 
higher than those seen are mapped to node one.) 



4.3 Evaluation of SVEP 

In this section, we compare the results from using SVEP against those obtained by 
PSPIN’s SCP and LHP schemes. All experiments were conducted on a network of 
LINUX workstations at Michigan Technological University (MTU). The MTU 
cluster is a network of workstations that are all 2.0 GHz P4's with 2GB of memory. 
In all of the experiments, partial order reduction was disabled and compression was 
enabled. In addition, the amount of memory that a SPIN process can consume was set 
to 2GB. The maximum depth of the stack was set to 300,000 for experiments using 
LHP and SVEP schemes as they typically produced search depths in excess of the 
10,000 default used by Spin. The maximum depth for the stack was left at 10,000 for 
experiments using SCP. The stack depths were not fine-tuned as specifying depths 
greater than the actual depths do not invalidate the results, and having the same depth 
for both LHP and SVEP eliminated the effects of stack depth on memory usage from 
consideration. The Deos model used for these tests was a medium sized one, with a 
depth of 267,386, memory of 1000.84MB, and a run time of 98 seconds when run 
serially. Note that peak memory is discussed separately, after the discussions 
regarding other performance measures. 

Results from SVEP 

As stated earlier, our scheme generates a partition function based on exploring a 
sample of the state space. Table 3 lists results obtained using our algorithm on three 
sets of experiments from between two and 16 processors. In the first set, a very small 
amount of the state space (0.04%) was explored during the prerun. In the second set, 
0.16% of the state space was explored, and 0.24% of the state space was explored in 
the third set of experiments. The table is arranged such that results of all three sets 
are grouped together for easy comparison. The first criterion examined is 
computation time in seconds. The table shows both the time for the PSPIN 
computation, and the time for the prerun in the parentheses. We note that the 
computation time when four through sixteen processors are used is always less than 
the computation time required for 2 processors. This is a good result in the sense that 
we do not see a trend where the computation time increases as the number of 
processors is increased. The second criterion examined is the amount of memory in 
Mbytes reported by PSPIN. Recall that the results listed are strictly the amounts of 
memory required by the Spin part of the verification and do not reflect the overhead 
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memory required by a single PSPIN process. Our tables list the memory consumed 
by the node that consumed the highest memory. These results show that the maximum 
amount of memory required to perform the Spin verification decreases as the number 
of processor is increased for all three sets of experiments. The third and fourth 
criteria measure communication overhead. The third criterion is the total number of 
messages transmitted as a result of the distributed computation using PSPIN. The 
fourth criterion is the total information transmitted in a distributed computation of 
PSPIN. Note that all criteria examined are generated as output of PSPIN. There are 
no discernable trends that can be observed from the communications results. In 
general, when sampling between 0.04% and 0.24% of the state space, the results can 
vary significantly. This is not unexpected, as our partitioning scheme is both heuristic 
and predictive. 

In order to make the following comparisons straightforward, we will use a single 
result from our experiments above (and not three results). For the remainder of the 
paper, when comparing the results from our algorithm versus results from partitioning 
by SCP and Local Hash partitioning, we will use the worst result selected from each 
criterion given in Table 3. For example, we use a PSPIN computation time of 3230 
seconds for two processors, with the corresponding prerun time of 80, and a max 
memory of 170.768 for 16 processors. We feel that this is a fair way to compare the 
schemes as results from our algorithm can vary significantly. Note that performance 
comparisons with respect to peak memory is discussed later in this section. 

SYEP versus SCP 

Table 4 is similar in format to Table 3 with the exception that the column labeled 
“Algo” indicates the type of algorithm used for generating the results. The results 
from partitioning by SCP show that the maximum amount of memory needed for Spin 
on any given processor is higher than the corresponding amount of memory needed 
by our algorithm. Also, the total number of messages communicated with SCP is 
much higher than the corresponding number for SVEP. In addition, the computation 
time required by SCP is higher than the computation time with SVEP. Thus, the 
SVEP scheme requires less memory than SCP, less communication and also less 
computation time. Therefore, our method is an improvement over SCP. This result is 
reasonable as our scheme can be considered as a generalization of the SCP scheme. 

SVEP versus LHP 

As stated earlier, we have found that LHP yields the best results when we use the 
local process that corresponds to the DEOS kernel to generate the partition function. 
The results for LHP that are listed in Table 5 were obtained by using the local process 
corresponding to the DEOS kernel to generate the partition function. 

When comparing LHP and SVEP schemes, we notice that our scheme is much 
faster across the board, and especially as the number of processors increases, than 
using PSPIN with LHP. This is explained because both the number of messages and 
the total amount of information communicated in LHP is much higher than with 
SVEP scheme. The max memory requirements are better for LHP than for our 
scheme. This is because the LHP balances the states better across the processors. 
However, it is important to note that both schemes are scalable with respect to 
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maximum memory usage. That is, the maximum memory requirements of both 
schemes decrease as the number of processors increase. 

Though LHP outperforms our SVEP scheme in minimizing the amount of memory 
consumed by any given processor, the SVEP scheme reduces communication and 
computation time. The problem of minimizing the amount of memory on any given 
processor is not dependent on computation time. However, if the maximum amount of 
memory consumed by the SVEP scheme and all overhead memory needed are within 
the constraints of the driving environment, then our method can be used to partition a 
serial application on n processors and can outperform local hash by over an order of 
magnitude in runtime. 



Table 3. SVEP performance when different amounts of state space are explored during premn 





%State Space 


2 


4 


8 


12 


16 


Time (s) 


0.04% 


3062(80) 


1133(107) 


702(129) 


1645(115) 


2409(116) 




0.16% 


3053(80) 


1714(107) 


2250(129) 


1692(115) 


2388(116) 




0.24% 


3230(80) 


2119(107) 


1171(132) 


1205(150) 


2039(152) 


mem 

(MB) 


0.04% 


508.911 


358.793 


208.981 


185.225 


170.786 




0.16% 


508.911 


364.117 


235.298 


199.765 


138.837 




0.24% 


747.81 


381.73 


233.557 


234.786 


128.29 


Msgs 


0.04% 


1,800 


951,847 


1,662,040 


4,594.660 


4,685.420 




0.16% 


1,800 


332.484 


704.782 


3,372.180 


3,430.090 




0.24% 


276,300 


400,659 


1,496.420 


1,872.560 


2,088.260 


Inf Trans 


0.04% 


6.24 


1,511.57 


3,526.98 


9,974.27 


9,962.32 




0.16% 


6.24 


793.156 


1,820.6 


6,916.97 


85.363.06 




0.24% 


741.01 


1,318.75 


3,641.21 


4,959.62 


5,590.97 



Table 4. Comparison of SVEP and SCP partitioning schemes 





Algo 


2 


4 


8 


12 


16 


Time (s) 


SVEP 


3,230 (80) 


2,119 (107) 


2,250 (129) 


1,692(115) 


2,388 (116) 




SCP 


5,937(4) 


4,815(5) 


3,334(9) 


4,078(12) 


3,145(16) 


mem (MB 


) |SVEP 


747.81 


381.73 


235.30 


234.79 


138.84 




SCP 


879.18 


879.18 


392.58 


356.12 


218.85 


Msgs 


SVEP 


276,300 


400,659 


1,662.040 


4,594.660 


4.685,420 




SCP 


3.97 M 


3.97 M 


9.44 M 


10.274 M 


14.617 M 


Inf Trans 


SVEP 


741.02 


1,511.57 


3,641.21 


9,974.27 


85,363.06 




SCP 


6,774.86 


6,774.74 


15.869.1 


20.032.1 


24,471.1 



Analysis of peak memory 

We have mentioned earlier that LHP requires less memory than our partitioning 
scheme to distribute Spin memory requirements. We have also mentioned that PSPIN 
overhead memory requirements for LHP can at times be much higher than the 
memory for serial Spin. Therefore, in analyzing our algorithm versus LHP, we need 
to look at PSPIN overhead memory requirements in both approaches. 
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Table 5. Comparison of SVEP and LHP partitioning schemes 





Algo 


2 


4 


8 


12 


16 


Time (s) 


SVEP 


3,230 


2.119 


2,250 


1.692 


2,388 




LHP 


3,631 


6.025 


35,367 


40.772 


1 1,783 


mem (MB) 


SVEP 


747.81 


381.73 


235.30 


234.79 


138.84 




LHP 


532.053 


281.89 


154.709 


104.943 


87.842 


Total Msgs 


SVEP 


276.300 


400,659 


1.662M 


4.595 M 


4.685 M 




LHP 


11.685M 


22.048M 


28 M 


27.925M 


29.889M 


Inf Trans 


SVEP 


741.02 


1,511.57 


3,641.21 


9,974.27 


85,363.06 




LHP 


18.731.4 


62,858.4 


91,014.7 


120,717.0 


96,186.9 



Table 6. Comparison of peak memory between LHP and SVEP 



Peak Mem 
(MB) 




2 procs 


4 procs 


8 procs 


12 procs 


16 procs 


LHP 




267.73 


655.43 


1144.22 


2580.74 


1242.11 


SVEP 


0.24% 


849.74 


1084.5 


1460.49 


1510.21 


1466.92 


SVEP 


0.16% 


19.12 


415.01 


582.56 


2104.37 


1684.53 



Table 7. Comparison of max messages between LHP and SVEP 



Max msg 


2 procs 


4 procs 


8 procs 


12 procs 


16 procs 


LHP 


1.131E+1 1 


3.703E+1 1 


3.790E+1 1 


3.044E+1 1 


2.247E+1 1 


SVEP 0.24% 


163884 


53172 


390672 


357636 


357636 


SVEP 0.16% 


1800 


104416 


129312 


505054 


372574 



As we can see from Table 6, there are times when our algorithm requires more 
overhead memory for PSPIN and other times when LHP requires more memory. Our 
initial thought is that reductions in total number of communications and total amount 
of information transferred will reduce overhead memory. However, further analysis 
is required. At the present, we are looking at the maximum number of messages that 
any processor sent and the maximum overhead of possibly a different processor in the 
same verification in an attempt to understand if there is a correlation between the two 
quantities. Table 7 compares the maximum number of messages (max msg) sent by a 
node with overhead. 

In an attempt to further analyze trends for the increase in memory overhead, we 
resorted to statistical analysis of the maximum number of messages sent versus the 
memory overhead. When using SVEP, we noticed that maximum messages and 
overhead memory correlate positively (i.e. one increases as the other increases), with 
a correlation factor of 0.82. This strong correlation indicates that it might be a 
worthwhile direction for better understanding memory overhead. 



5 Related Work 



The work done by Lerda and Visser [13] on distributing JPF is most closely related to 
our work. The difference lies in their approach of starting from Java. In addition, they 
also explore various ways of optimizing the communication and verification time. We 
plan to implement such optimizations once we better understand the nature of the 
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peak memory. There has also been work on various extensions to PSPIN [1] and Spin 
[5,9]. In addition, there has also been work in distributing other formal analysis tools 
such as Petri nets [3], 



6 Conclusions and Future Directions 

There is increasing interest in verifying large models in the context of safety-critical 
systems, and distributed verification is a promising approach for achieving this. In 
this paper, we evaluated a distributed implementation of Spin. Our analyses show that 
there is a penalty in terms of verification time for distribution, while the actual 
memory used for states (i.e. the PSPIN reported memory) reduces almost linearly 
with the number of processing nodes. We also presented another partitioning scheme 
that performs much better in terms of verification time, while not requiring the user to 
choose a specific Spin process in order to achieve good partitioning. But all the 
partitioning schemes suffer from high memory overhead. Our results point to a strong 
correlation between overhead and the maximum number of messages sent by a node. 
Exploration of this correlation and distribution of partial order reduction are aspects 
we will be pursuing in the future. 

It is possible to generalize the SVEP scheme by considering partitionings that are 
combinations of the values seen from multiple variables. For example, the example 
graph shown in Figure 4(a) could be collapsed with respect to the values of both x and 
y. In this case, there are six possible permutations of the seen values ((0,- 
1), (0,0), (0,1), (1,-1), (1,0), (1,1)), and so the collapsed graph would have six vertices. 
This extension will increase the run time of the algorithm significantly, as many more 
partitionings need to be computed and compared. However, it is likely to come up 
with better results. Also, it is worth investigating extending the SVEP scheme to be 
used not only for static (i.e., a priori) load balancing, but also for dynamic load 
balancing. This extension would address the problem that the structural 
characteristics of the state space might change after a certain point in the execution. 
In the current, static load balancing approach, only a small portion of the state space is 
sampled and this space is early in the execution. Finally, we would like to develop a 
partitioning scheme that focuses on minimizing peak memory. 



Acknowledgements. We would like to convey our thanks to Michigan Technological 
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References 

[1] Barnat, J, Brim, L, and Stribrna, J. “Distributed LTL Model-Checking in SPIN, Technical 
Report FIMURS -2000-10, Faculty of Informatics, Masaryk University, 2000. 

[2] Berger, M and Bokhari, S. “A Partitioning Strategy for Nonuniform Problems on 
Multiprocessors”. IEEE Transactions on Computers, C-36(5), pages 570-580, May 1987. 

[3] Ciardo, G, German, R, and Lindemann, C, "A Characterization of the Stochastic Process 
Underlying a Stochastic Petri Net", Software Engineering, Vol. 20 No. 7, 1994. 




Analysis of Distributed Spin Applied to Industrial-Scale Models 



285 



[4] Cofer, Darren and Rangarajan, Murali, "Formal Modeling and Analysis of Advanced 
Scheduling Features in an Avionics RTOS”, in Proceedings of EMSOFT ’02: Second 
International Workshop on Embedded Software, Springer-Verlag. 

[5] Demartini, C, Iosif, R, and Sisto, R, “dSpin: A Dynamic Extension of SPIN”, Lecture 
Notes in Computer Science, Vol 1680, September 1999. 

[6] Giannakopoulou, D and Lerda, F, "From States to Transitions: Improving Translation of 
LTL Formulae to Buchi Automata", Proc. of 22 nd { IFIP } International Conference on 
Formal Techniques for Networked and Distributed Systems, November, 2002. 

[7] Holzmann, G. J., "The Model Checker Spin”, IEEE Trans, on Software Engineering, 
23(5):279-295, May 1997. 

[8] Honeywell, 1999, “Design Description Document for the Digital Engine Operating 
System,” Honeywell Specification no. PS7022409. 

[9] Iosif, R, and Sisto, R, "Using Garbage Collection in Model Checking", Proc. of the 7th 
International SPIN Workshop, Vol. 1885, Springer-Verlag, 2000. 

[10] Karypis, G and Kumar, V. METIS: A Software Package for Partitioning Unstructured 
Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices, 
Version 4.0. Technical Report, Dept, of Computer Science, University of Minnesota, 
1998. 

[11] Karypis, G and Kumar, V. Multilevel Algorithms for Multi-Constraint Graph 
Partitioning. In Proceedings of Supercomputing 1998, 1998. 

[12] Lerda, F, and Sisto, R, “Distributed-Memory Model Checking with Spin”, in Proceedings 
of the 5 th International Spin Workshop, Vol. 1680 of LNCS, Springer-Verlag, 1999. 

[13] Lerda, F, and Visser, W, “Addressing Dynamic Issues of Program Model Checking”, 
Proc. 8th SPIN Workshop, Lecture Notes in Computer Science, Vol. 2057, 2001. 




Verification of MPI-Based Software for Scientific 

Computation 



Stephen F. Siegel and George S. Avrunin 

Laboratory for Advanced Software Engineering, Dept, of Computer Science, 
University of Massachusetts, Amherst MA 01003, USA 
{siegel , avrunin}@cs .mass . edu 
http : //laser . cs . umass . edu 



Abstract. We explore issues related to the application of finite-state 
verification techniques to scientific computation software employing the 
widely-used Message-Passing Interface (MPI). Many of the features of 
MPI that are important for programmers present significant difficulties 
for model checking. In this paper, we examine a small parallel program 
that computes the evolution in time of a discretized function u defined 
on a 2-dimensional domain and governed by the diffusion equation. Al- 
though this example is simple, it makes use of many of the problematic 
features of MPI. We discuss the modeling of these features and use Spin 
and INCA to verify several correctness properties for various configura- 
tions of this program. Finally, we describe some general theorems that 
can be used to justify simplifications in finite-state models of MPI pro- 
grams and that guarantee certain properties must hold for any program 
using only a particular subset of MPI. 



1 Introduction 

The advent of relatively cheap clustered computers and infrastructure support- 
ing parallel programs on them has made supercomputers much more accessible 
and greatly expanded the range of application of scientific computing. Yet de- 
velopers of parallel scientific software have encountered many problems that 
are familiar to specialists in verification of concurrent software: programs dead- 
lock; they display inappropriate nondeterministic behavior; bugs are difficult to 
reproduce, let alone pinpoint and correct. Finite-state verification (FSV) tech- 
niques, such as model checking, can offer solutions to many of these problems, at 
least in principle. But typical scientific parallel programs pose special challenges 
for FSV techniques. For instance, in the widely used Message-Passing Interface 
(MPI) [13], the memory available for buffering messages between two processes, 
and thus the number of messages that can be buffered, can change dynamically 
and unpredictably during execution. 

In this paper, we describe some first attempts at applying FSV techniques 
to a small, but realistic, example of a parallel scientific program that uses MPI. 
This program computes the evolution in time on a 2-dimensional domain of a 
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discretized function governed by the differential equation known as the “dif- 
fusion” (or “heat”) equation. We discuss some of the issues in modeling such 
programs and present the results of verifying several data-independent proper- 
ties of the communication skeletons of various configurations of this program, 
using Spin [7] and INCA [3,15]. Finally, we describe some theorems that can be 
used to justify simplifications in models of MPI programs, and that guarantee 
certain properties must hold for programs using only a particular subset of MPI. 

In the next section, we briefly discuss the basic MPI constructs and introduce 
our example program, Diffusion2d. In section 3, we explain how we modeled 
this program for Spin and present the results of our initial attempts to verify 
properties with Spin. We then discuss modeling and verification with INCA. In 
section 5, we present the theorems. In section 6, we briefly describe some related 
work, and we discuss our conclusions and plans for future work in section 7. 

2 The Message-Passing Interface and an Example 
Program 

Most parallel scientific programs rely on message-passing for inter-process com- 
munication. The basic ideas of this paradigm have been around since the late 
1960s, and by the early 1990s, several different and incompatible message-passing 
systems were being used to develop significant applications. The desire for porta- 
bility and a recognized standard led to the creation of the Message-Passing In- 
terface, which defines the precise syntax and semantics for a library of functions 
for writing message-passing programs in a language such as C or Fortran. Since 
that time, a number of high-quality proprietary and open-source MPI implemen- 
tations have become available on almost any platform, and MPI has become the 
de facto standard for parallel scientific software. 

2.1 MPI Basics 

An MPI program consists of autonomous processes executing their own code in 
an MIMD style, although in practice the code executed by the different processes 
is often identical. The processes communicate by calls to functions implementing 
the MPI communication primitives. There are more than 140 such functions in 
the MPI-1 Standard. In this section, we briefly describe a few of the most basic 
ones, those used in our Diffusion2d example. 

Each of the MPI functions described here takes, as one of its arguments, 
an MPI communicator. A communicator specifies a set of processes which may 
communicate with each other through MPI functions. Given a communicator 
with n processes, the processes are numbered 0 to n — 1; this number is referred 
to as the rank of the process in the communicator. MPI provides a pre-defined 
communicator, MPI_COMM_WORLD, which represents the set of all processes. 

The simplest function for sending a message from one process to another is 
the standard mode, blocking send function, with the form 

MPI_SEND(buf, count, datatype, dest, tag, comm), 
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where buf is the address of the first element in the sequence of data to be sent; 
count is the number of elements in the sequence, datatype indicates the type of 
each element in the sequence; dest is the rank of the process to which the data 
is to be sent; tag is an integer that may be used to distinguish the message; and 
comm is a handle representing the communicator in which this communication 
is to take place. 

The simplest function for receiving a message is the blocking receive function, 
with the form 

MPI_RECV(buf, count, datatype, source, tag, comm, status). 

Here, buf is the address for the beginning of the segment of memory into which 
the incoming message is to be stored. The integer count specifies an upper bound 
on the number of elements of type datatype to be received. The integer source 
is the rank of the sending process and the integer tag is the tag identifier. How- 
ever, unlike the case of MPLSEND, these last two parameters may take the 
wildcard values MPI_ANY_S01)RCE and MPI_ANY_TAG, respectively. The pa- 
rameter comm represents the communicator, while status is an “out” parameter 
used by the function to return information about the message that has been re- 
ceived. An MPI receive operation will only select a message for reception if the 
source, tag, and communicator of the message match the corresponding receive 
parameters appropriately. 

The MPI implementation may decide to buffer an outgoing message. On the 
other hand, the implementation may decide to block the sending process, perhaps 
until the size of the system buffer becomes sufficiently small, before sending out 
the message. Finally, the system may decide to block the sending process until 
the receiving process is at a matching receive, and there is no pending message in 
the system buffer that also matches that receive. If the implementation chooses 
this last option, we say that it forces this particular send to be synchronous. 

The MPI Standard requires that messages be nonovertaking, in the sense 
that two messages from a single sender to single destination, both matching the 
same receive, can only be received in the order in which they were sent. But this 
is the only guarantee concerning the order in which messages are received — if 
two different processes send messages to the same process, the messages may be 
received in the order opposite to that in which they were sent. 

MPI also provides non-blocking send and receive operations in which, for 
example, a sending process can continue execution even before the system has 
finished copying the message from the send buffer. In addition, both the blocking 
and non-blocking sends come in several modes. We have described the standard 
mode, but there are also the synchronous , buffered , and ready modes. 

It is common in MPI programs to have processes exchange data, either di- 
rectly, when two processes send to and receive from each other, or more generally, 
as when each process in a grid sends to one neighbor and receives from another 
neighbor. In such cases, each process must execute one send and one receive, 
but if the program is coded so that each process first sends and then receives, 
the program will deadlock if, for example, the MPI implementation chooses to 
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synchronize all the sends. The situation occurs frequently enough that MPI pro- 
vides a special function that executes a send and a receive in one invocation, 
as if the process forks off two independent threads — one executing the send, the 
other the receive — and returns when both have completed. This function has the 
form 

MPI_SENDRECV(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, 
recvcount, recvtype, source, recvtag, comm, status). 

The first 5 parameters are the usual ones for send, the next 5 together with 
status are the usual ones for receive, and comm specifies the communicator for 
both the send and receive. 

Finally, MPI provides a function MPLBARRIER(comm) that is used to force 
all processes in the communicator comm to synchronize at a certain point. 

2.2 Diffusion2d 

Diffusion2d is a parallel program that computes the evolution in time of a dis- 
cretized function u defined on a 2-dimensional domain and governed by the 
diffusion equation. It is the “teacher’s solution” to a programming project for 
a course in Parallel Programming (taught by Andrew Siegel) at the University 
of Chicago. Though quite simple, it has many features common in scientific 
programs: the physical domain is divided into a “grid” in which one process is 
responsible for each section; each process also maintains a number of “ghost- 
cells” that mirror the contents of cells on neighboring processes. The program 
is written mostly in C, with some Fortran, and consists of 1036 lines spread out 
over six files. Slightly more than half this total consists of a generic package for 
dealing with grid structures; this package includes the exchangeGhostCells and 
grid_write functions defined below. 

Our first task was to create a more compact and abstract representation of the 
code which captured the essential communication infrastructure. We follow the 
convention of [17, 6] in using uppercase sans serif type for the general functions, 
and mixed-case typewriter font for their C bindings. We also abbreviate the 
parameter lists by including only the send buffer and destination for a send, and 
the receive buffer and source for a receive, resulting in the following pseudo-code: 

int nprocsx, nprocsy, nxl, nyl , nprint, nsteps; 
int myproc, mycoordO, mycoordl ; 
int upperNabe, lowerNabe , leftNabe, rightNabe; 
int[,] u; 

int [] send_buf , recv_buf , buf ; 

void main() { 
int iter = 0; 

// read nprocsx, nprocsy, nxl, nyl, nprint, nsteps 
MPI_Init() ; 

myproc = MPI_Comm_Rank() ; 

mycoordO = myproc "/, nprocsx; mycoordl = myproc / nprocsx; 
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upperNabe = mycoordO + nprocsx* ( (mycoordl + l)'/,nprocsy) ; 

lowerNabe = mycoordO + nprocsx* ( (mycoordl + nprocsy - l)'/,nprocsy) ; 

leftNabe = (mycoordO + nprocsx - 1)°/ O nprocsx + nprocsx*mycoordl ; 

rightNabe = (mycoordO + l)"/.nprocsx + nprocsx*mycoordl ; 

send_buf = new int [nxl] ; recv_bnf = new int [nxl] ; buf = new int [nxl] 

u = new int [nxl+2 ,nyl+2] ; 

setlnitialValues () ; 

exchangeGhostCells 0 ; 

MPI_Barrier () ; 

for (iter = 1; iter <= nsteps; ++iter) { 
update (u) ; 

exchangeGhostCells () ; 

if ((iter '/, nprint) == 0) grid_write() ; 

> 

MPI_Barrier () ; 

MPI_Finalize() ; 

} 

void exchangeGhostCells () { 

for (int i = 1; i <= nxl; ++i) send_buf [i-1] = u[i,l]; 
MPI_Sendrecv(send_buf , lowerNabe, recv_buf, upperNabe); 
for (int i = 1 ; i <= nxl; ++i) u[i,nyl+l] = recv_buf [i-1] ; 

for (int i = 1 ; i <= nxl; ++i) send_buf [i-1] = u[i,nyl]; 

MPI_Sendrecv(send_buf , upperNabe, recv_buf, lowerNabe); 
for (int i = 1; i <= nxl; ++i) u[i,0] = recv_buf [i-1] ; 

for (int j = 1 ; j <= nyl; ++j) send_buf [ j — 1] = u[l,j]; 

MPI_Sendrecv(send_buf , leftNabe, recv_buf , rightNabe); 
for (int j = 1; j <= nyl; ++j) u[nxl+l,j] = recv_buf [j-1] ; 

for (int j = 1 ; j <= nyl; ++j) send_buf [j-1] = u[nxl,j]; 

MPI_Sendrecv(send_buf , rightNabe, recv_buf, leftNabe); 
for (int j = 1 ; j <= nyl; ++j) u[0,j] = recv_buf [j-1] ; 

> 

void grid_write() { 
if (myproc != 0) { 

for (int n = 0; n < nprocsy; ++n) 
if (mycoordl == n) 

for (int j = 1; j <= nyl; ++j) 

for (int m = 0; m < nprocsx; ++m) 

if (mycoordO == m) MPI_Send(u [1 . .nxl , j] , 0); 

} else { 

for (int n = 0; n < nprocsy; ++n) 
for (int j = 1; j <= nyl; ++j) 

for (int m = 0; m < nprocsx; ++m) { 
int from_proc = m + nprocsx*n; 
if (from_proc != 0) MPI_Recv(buf , from_proc) ; 
else for (int i = 0; i < nxl; ++i) buf [i] = u[i+l,j]; 
disk_write (buf ) ; 

> 

> 

MPI_Barrier () ; 
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Each process executes its own copy of this code, and all of the variables are 
local to this process. One process may take a different execution path through 
the code than another by branching on its rank. The total number of processes 
is specified by a flag to the MPI implementation when the program is executed. 

The first six variables in the code are read from a file at the beginning 
of program execution; they act essentially as the parameters which define the 
geometry of the grid, the number of loop iterations, and the frequency with 
which the data are to be written to disk. (Actually, the original source code 
implicitly requires that nxl = nyl, so there are really only 5 parameters.) The 
processes themselves may be thought of as being arranged in a global M x N 
grid, where M = nprocsx and N = nprocsy. The variable myproc is set to the 
rank r of this process; it is obtained from the MPI infrastructure via the function 
MPI_COMM_RANK. The position of this process on the global grid is ( m,n ) = 
(mycoordO, mycoordl), where 0 < m < M, 0 < n < N, and r = in + nM. 

The next four variables are used to store the ranks of the neighboring pro- 
cesses. The grid “wraps” around in both the x- and y-directions, so every process 
has an upper, lower, left, and right neighbor. The four neighbors are not neces- 
sarily distinct. In fact, if nprocsx = 1 then this process will be its own left and 
right neighbor, and if nprocsy = 1 this process will be its own lower and upper 
neighbor. In these cases some of the MPI calls will actually send messages from 
a process to itself; this is allowed by MPI. 

The variable u is used to store the values of the function that is evolving 
with time. It stores the values of this function for the coordinates that lie within 
the portion of the grid corresponding to this process. The dimensions of u are 
(nxl + 2) x (nyl + 2) because the top and bottom rows, and the left and right 
columns, are used to store the ghost cells — these are the cells that mirror the 
corresponding cells in the neighboring processes. (The four “corner” positions 
are not used.) The next three variables are used as temporary buffers for the 
MPI communication carried out in exchangeGhostCells and grid_write. 

After initializing the variables, the basic structure of execution is relatively 
simple. First, a glrost-cell exchange is carried out. This updates the ghost cells by 
sending out the current values of the boundary cells to the appropriate neighbors 
and in turn receiving the values of their boundary cells, and storing the received 
values into the glrost-cell positions of u. 

Next, a barrier is called, and then the main loop begins. In each iteration, 
first the values of u [1 . . nxl , 1 . . nyl] are updated according to a formula derived 
from the discretization of the diffusion equation. This is purely a local function — 
it does not involve any MPI communication. Then a ghost-cell exchange takes 
place, and, if iter is divisible by nprint, grid_write is called. 

The function grid_write is somewhat complicated. The goal is to write the 
cells to disk in the proper order: first, the entire global row 0 must be written, 
from left to right, then the same for global row 1, etc. To control the order, the 
data are sent, one local row at a time, from all of the processes of positive rank 
to the process of rank 0, which then writes to the disk. The nested loops encode 
a common MPI idiom for dealing with grid structures: n runs through global 
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y-coordinates of the processes; for each n, j runs through the local rows, and 
then m runs through the global ^-coordinates of the processes. For a process of 
positive rank, action is taken only when (n, m) matches the global coordinates of 
this process. 

Through discussions with the programmer, we arrived at several correct- 
ness properties for Diffusion2d. We describe three of these here: Deadlock- 
Free, GlobalLockstep, and LocalLockstep. The first says that the program 
can never deadlock. The second says that, for all n > 0, no process will call 
update (u) at step n unless every process has already completed update (u) at 
step n — 1. The programmer pointed out that though he expected this property 
to hold, all that was actually required for correctness was the weaker property 
LocalLockstep — this says the same thing, but only for the four neighbors of a 
process. 

In addition to these correctness properties, there were several questions con- 
cerning Diffusion2d that arose in our discussions with the programmer. One is 
Ql: Could the removal of the MPI_Barrier statements from the code lead to a 
calculation that woidd not have occurred with the barriers, or coidd their removal 
lead to deadlocks if there were none before? The ability to answer questions of 
this sort could be very valuable to scientific programmers: barriers can take a 
huge toll on the performance of a parallel program, but it is often difficult to 
reason about them informally. Another is Q2: Are the final values written to 
disk independent of the interleaving or buffering choices made by the MPI im- 
plementation? The expectation is certainly that this should be so, but it is not 
obvious, a priori, why that must be. 



3 Applying Spin 

3.1 Modeling the MPI Communication Primitives 

Our first task was to model precisely the MPI communication functions. This 
task was simplified somewhat by the fact that Diffusion2d uses only one tag, 
and never makes use of the wildcards MPI_ANY_SOURCE or MPI_ANY_TAG. 
For these reasons, we could simply use one Spin channel chan_i_j, for every 
ordered pair of processes (i,j), to transfer messages from process i to process j. 

Of course we must place a bound on the size of the channels. We call this 
chan_size. A priori, this means that our model fails to be conservative, in that 
the number of pending messages sent from one process to another can never 
exceed chan_size in the model, whereas MPI imposes no such bound. However, 
we will see that in certain cases we can justify a particular choice for chan_size. 

We allow for the possibility that chan_size = 0, i.e., for the case where all 
communication is forced to take place synchronously. In this case the definitions 
of send and receive coincide exactly with those of Spin: 



inline MPI_Send(schan, msg) { schanlmsg } /* chan_size = 0 */ 

inline MPI_Recv(rchan, rbuf) { rchan?rbuf } 
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A process i wishing to send a message msg to process j calls MPI_Send with 
schan = chan_i_j. The receive is used in a similar way, and there rbuf is a 
variable, representing the receive buffer, into which the incoming message will 
be stored. 

If chan_size > 0 the situation is a little more complicated. The semantics of 
Spin channels guarantees that each channel will maintain the messages in FIFO 
order. However, recall that the MPI Standard allows for the MPI implementation 
to choose, at any time, to force a particular communication to synchronize. To 
capture this behavior in our model, we leave the receive procedure as is, but we 
modify the send procedure by inserting a statement that may block until the 
channel is empty: 

inline MPI_Send (schan, msg) { /* chan_size > 0 */ 

schanlmsg; if :: 1 -> empty(schan) :: 1 fi } 

We represent MPLSENDRECV by allowing the send and receive to happen 
in either order. Again, we must also allow for the possibility that the send is 
synchronized. For chan_size > 0 we hence arrive at the following: 

inline MPI_Sendrecv( schan, msg, rchan, rbuf) { /* chan_size > 0 */ 
if :: schanlmsg -> rchan?rbuf :: rchan?rbuf -> schanlmsg fi; 
if : : 1 -> empty (schan) : : 1 fi } 

The case chan_size = 0 must be dealt with separately because of the possi- 
bility that schan = rchan, i.e. , the case where process i attempts to send/receive 
a message to/from itself. The MPI Standard guarantees that the send and re- 
ceive must be able to take place synchronously if there are no pending messages 
(sent from i to i). This situation requires no special handling in the positive 
chan_size case because, if there are no pending messages, the channel will be 
empty, and so the message can first be placed in the channel and then received. 
However, for chan_size = 0, the procedure defined above would always block, 
when in fact it should always be able to complete without blocking. Hence we 
modify the definition in this case so that the message is just placed directly into 
the receive buffer: 

inline MPI_Sendrecv( schan, msg, rchan, rbuf) { /* chan_size = 0 */ 
if : : (schan == rchan) -> rbuf = msg : : else -> 

if :: schanlmsg -> rchan?rbuf :: rchan?rbuf -> schanlmsg fi 
fi > 

For chan_size > 0, we will refer to the procedures above as the complex 
communication model. We will also consider the simple communication model, 
in which we just remove the potentially blocking statement from the send and 
send-receive procedures. The effect of this will be that some statements in the 
simple model will not block whereas they might when the actual MPI program is 
executed. However, we will see that in some cases, the simple model will suffice 
for verifying certain properties, and can also improve the performance of Spin. 
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Finally, we turn to MPLBARRIER. There are many standard ways to model 
a barrier; we chose a simple “coordinator barrier” approach. This entailed intro- 
ducing an extra process with which an ordinary process interacts when it wishes 
to enter and exit a barrier. A special synchronous channel is used exclusively for 
this purpose. A 0 is sent on this channel when the process enters the barrier, 
and then a 1 is sent when the process attempts to exit the barrier. The barrier 
function waits to receive one 0 for each original process before accepting any Is: 

chan barrier_chan = [0] of {bit}; 

inline MPI_Barrier () { barrier_chan ! 0 ; barrier_chan ! 1 } 
active proctype Barrier () { 

end_b: do :: barrier_chan?0 ; ...; barrier_chan?0 ; 

barrier_chan?l ; ...; barrier_chan?l od } 

The label end_b is used to indicate to Spin that an execution in which all other 
processes have terminated and the barrier is at position end_b should not be 
considered deadlocked. 

3.2 Modeling Diffusion2d and Properties in Spin 

Having dealt with the model of the MPI infrastructure, we turned to the Diffu- 
sion2d program itself. The first question was how to represent the floating point 
data. However, we observed that none of the properties discussed here depend 
in any way on the actual values of the data. This fact reflects the simplicity 
of the program we have chosen as our example. Had our program contained, 
for example, a conditional statement that branched on an expression involving 
some of the data, we would have had to think carefully about the appropriate 
abstractions. Indeed, we plan on looking at such an example in future work. But 
for this example, we could simply abstract away the data altogether, and we sent 
only the bit 1, representing any message, on the channels. The upshot is that 
the channels in our model keep track only of the number of pending messages 
sent from one process to another. 

We made a few other simple optimizations (for example, leaving out channels 
that would never be used) . Otherwise, the Promela code for each process looks 
very much like the pseudo-code for Diffusion2d given above. To make it easy 
to experiment with different choices of the parameters and other options, we 
wrote a simple Java program that takes as input those choices, and outputs 
the corresponding Spin model. The 5 parameters that come from the program 
itself are nprocsx, nprocsy, nxl, nsteps, and nprint. The additional choices 
affecting the model include chan_size, the choice between the simple or complex 
communication model (if chan_size > 0), and the choice of whether or not to 
include the barriers. 

We now explain how we expressed the lockstep properties. (Spin already has 
a built-in capacity to check for deadlock.) Let far (i,j) be the predicate which is 
true precisely when process i is at the position just before the call to update (u) 
and the value of iter in process i minus the value of iter in process j is 




Verification of MPI-Based Software for Scientific Computation 



295 



greater than 1. This can be expressed in Spin by making the two iter variables 
global, inserting a label (Calculate) at the appropriate positions in the code, 
and adding the following: 

#define far _i_j (proc_?'@Calculate kk (iter_i - iter_j >1)) 

This predicate represents the undesirable behavior; let Lockstep(i, j) be the 
claim that far(i,j) is always false. We can use Spin to check this property by 
including the never claim generated from the LTL formula ofar _i_j. 

Now, GlobalLockstep is equivalent to the conjunction of the Lockstep(i , j) 
over all (i,j) for which i ^ j. To check this property, we could either verify the 
Lockstep(i, j) individually, or we could define p to be the disjunction of the 
far(i, j) and check the single never claim generated from <>p. The first approach 
may allow Spin to scale further, though the second is probably more efficient 
for small configurations. Similar comments apply to LocalLockstep, which is 
the conjunction over all (i,j) for which i and j are neighbors and i ^ j. 

There is another approach which is more specific but could help the model 
checker further by reducing the number of states that need to be explored. Let 
calc(i, n) be the predicate which is true precisely when process i is at the position 
just before the call to update (u) and the value of iter in process i is n: 

#define calc_i_n (proc_?'@Calculate kk (iter_z == n) ) 

For n > 1, let Lockstep(i, j; n) be the claim that, on any execution, calc(i,n) 
must be preceded by calc (j,n — 1). Hence Lockstep(i, j) is equivalent to the 
conjunction, over all n, of the Lockstep(i, j; n). To check this property with 
Spin, we use the never claim generated from the LTL formula (\q)Up 1 where 
p = calc(i, n) and q = calc(j, n— 1). The potential advantage arises from the fact 
that the search can be truncated as soon as q becomes true when p is false. 

3.3 Verification with Spin 

We instantiated a large number of models for various choices of the parameters 
and model options, and checked various versions of the properties. We used Spin 
version 4.0.0 of 1 January 2003, running on a Linux box with a 2.2 GHz Xeon 
processor and 4 GB of memory. In all cases we used the Spin options -DSAFETY 
(as ours are all safety properties), -DNOFAIR (as no fairness assumptions were 
needed), -DCOLLAPSE (for better compression), and -DMEMLIM=3000 (to utilize 
all 3 GB of memory available to a single process). To simplify the discussion 
that follows, we will use the term “n x to configuration” to refer to the configu- 
ration with nprocsx = n, nprocsy = to, and, unless explicitly stated otherwise, 
nxl = 1 and nprint = nsteps = 2. (All inputs and results are available at 
http : / /laser . cs . umass . edu/~siegel/projects.) 

We were able to verify DeadlockFree and LocalLockstep in all cases where 
Spin did not run out of memory. But in verifying GlobalLockstep, a surpris- 
ing thing occurred: for certain configurations, Spin found a counterexample. In 
general, the existence of a counterexample required nprocsx or nprocsy to be at 




296 



S.F. Siegel and G.S. Avrunin 



least 4, chan_size > 1, and nsteps > 2. In such circumstances, it is possible for 
one process to begin its calculation of u 2 (the value of u at time step 2) before 
another has begun its calculation of u 1 . 

To see how this can happen, we outline the trace produced by Spin for the 
4x1 configuration. We may ignore the vertical exchanges of ghost cells as these 
just involve a process communicating with itself. Say all processes have just 
exited the first barrier and are about to enter the for loop. At this point iter 
is 0 in each process. Now process 3 may proceed to call exchangeGhostCells 
and send a message to process 2 as part of its first Sendrecv. Process 2 may also 
proceed to this point, receive this message, and send a message to process 1. At 
this point, process 2 may proceed to its second Sendrecv statement and send a 
message to process 3. Now process 1 may proceed to send a message to process 
0, receive the message from process 2, proceed to its second Sendrecv and send 
a message to process 2. Process 2 can then receive that message, completing its 
participation in the exchangeGhostCells routine, return to the top of the loop, 
and then begin its calculation of u 2 . Process 0 has still not entered the loop. 
Notice that at this point there will be two pending messages: one sent from 
process 2 to process 3, the other sent from process 1 to process 0. 

As remarked earlier, the correctness of Diffusion2d does not depend on this 
property. However, the violation was a surprise to the programmer, who thought 
the synchronization enforced by the sends and receives would force all the pro- 
cesses to stay “close”. In fact, once we have understood this counterexample, it 
is not hard to see that in a strip of 2 n processes, processes 0 and n can become n 
time steps apart, if chan_size > 1. It appears, however, that for chan_size = 0, 
GlobalLockstep holds. 

We did run out of memory for relatively small configurations — for example, 
4 x 4 for DeadlockFree, even with chan_size = 0. We were able to verify the 
4x3 case with chan_size = 0 and with barriers. In this case there were 3.8 x 10 6 
states stored, Spin used 153 MB of memory, and it took slightly over 3 minutes 
to execute spin -a and compile and execute the analyzer. Without barriers, the 
performance was 1.5 x 10 6 states, 60 MB, and just over 1 minute. 

For positive chan_size, the choice of simple or complex communication 
model made a big difference in the size of the state space. Consider, for exam- 
ple, the 4x2 configuration without barriers and with chan_size = 1. Using the 
complex model, there were 2.8 x 10 7 states stored in verifying DeadlockFree; 
with the simple model this number was reduced to 9.6 x 10 4 . 

As we suspected, verification of Lockstep(0, 1; 2) required fewer states than 
for the stronger Lockstep(0, 1). For example, verification in the 3x3 configu- 
ration with chan_size = 1 , no barriers, and the simple communication model, 
required storing 89,838 states for the former property, and 400,482 states for 
the latter. This allowed us to scale Lockstep(0, 1; 2) as far as the 4x4 con- 
figuration (2.5 x 10 7 states, 1655 MB, 29 minutes). Spin was able to find the 
counterexample to Lockstep(0, 2; 2) in quite large configurations — at least 7x7 
(beyond this point, Spin complains that there are too many channel types). 
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The lockstep properties provide an example for which the simple commu- 
nication model is conservative. For, if we ignore what goes on within a call to 
an MPI function, the set of all execution prefixes is the same whether one uses 
the simple or complex model; the difference lies in the fact that some of those 
finite prefixes may be considered deadlocked in the complex model, but not in 
the simple one. Since a lockstep property does not depend on this distinction, it 
will hold under the simple model if, and only if, it holds under the complex one. 

All of the results were the same with and without the barriers. At the very 
least, this provided some evidence that the barriers might not be necessary. 

4 Applying INCA 

INCA takes as input a description of a concurrent program in the S-Expression 
Design Language (SEDL) together with a property expressed in the INCA query 
language [14] . (The query actually describes a violation of the property, much like 
the never claims in Spin.) From these it produces an Integer Linear Programming 
(ILP) problem which can be analyzed by standard linear programming tools such 
as CPLEX. If the ILP problem has no solution, the property is guaranteed to 
hold on all executions of the program. On the other hand, a solution to the 
ILP problem may or may not correspond to an actual counterexample to the 
property; if not, then there are ways to augment the ILP problem to increase the 
precision of the model. A constrained search using the values in a solution to the 
ILP problem as the counts of events in an execution can be used to determine 
whether a solution corresponds to a counterexample. 

SEDL resembles a subset of Ada in a Lisp-like syntax. One defines tasks, 
which have their own local variables, execute concurrently, and communicate 
via rendezvous. SEDL does not explicitly provide for buffered communication, 
nor for shared variables (which is essentially what the Spin channels are). For 
these reasons, for our initial experiments with INCA we restricted ourselves to 
the case chan_size = 0. 

As with Spin, our first task was to model the MPI primitives. To do this, 
each task j declares entries chan_i_j- Process i sends a message to process j by 
calling that entry; process j receives from i by issuing an accept on that entry. 
The MPLSENDRECV is modeled using the SEDL select statement, which is 
like the Spin if statement. For example, an MPLSENDRECV issued in process 
0, sending to process 6 and receiving from 3, would be represented as follows: 

(select (when t (call proc_6 chan_0_6) (accept chan_3_0)) 

(when t (accept chan_3_0) (call proc_6 chan_0_6))) 

This even works in the case where a process send-receives itself as INCA allows 
a task to call one of its own entries. The barrier was modeled exactly as with 
Spin, and again, we abstracted away the floating point data to produce a model 
much like the one used for Spin. 

There is a standard INCA deadlock query, and the queries for Lockstep(i, j) 
and Lockstep(i, j; n) are just like the corresponding never claims in Spin. There 
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is no easy way to represent the conjunction of two properties as a single property 
in INCA (this is because one cannot easily represent the disjunction of two ILP 
problems as a single ILP problem) and so, unlike the case for Spin, we could not 
check all parts of GlobalLockstep or LocalLockstep all at once. 

We instantiated models for the same configurations that we had used for 
Spin, and used INCA version 3.5 and CPLEX Optimizer 8.1.0 on the same 
hardware. For INCA, the limiting factor is more often time than memory: the 
ILP problems generated by INCA can often be solved by CPLEX very quickly, 
but sometimes CPLEX will run for hours without reaching a conclusion; in these 
cases we say we “ran out of time.” 

We were able to verify all of the properties whenever we did not run out of 
time. (Recall that the violation to GlobalLockstep requires a positive value 
for chan_size.) For DeadlockFree, the choice of whether or not to include the 
barriers made a big difference in how far we could scale. With the barriers we 
ran out of time on the 3x3 grid. Without them we could scale as far as 8 x 8, 
and the time required (which includes the time to run INCA and CPLEX) grew 
roughly exponentially in the number of processes: 33 seconds for the 5x5 grid, 
5 minutes for 6 x 6, 54 minutes for 7 x 7, 2,577 minutes for 8x8. 

We were able to verify Lockstep(0, 1; 2) and Lockstep(0, 2; 2) for very large 
grids as well (at least 12 x 12), with and without barriers. For even the largest 
of these, the analysis time is under one minute. 



4.1 Using INCA for Buffered Communication 

In order to incorporate buffers into our INCA model, we created a separate 
channel task chan_i_j for each pair of processes. Since we are only keeping track 
of the number of messages in a channel, each channel task contains one integer 
variable (len) which is incremented or decremented as messages are deposited 
or removed. The sending task calls an entry send in the channel task to deposit 
a message; the receive task calls an entry receive to pick up a message. 

As with Spin we had two channel models: the simple one, in which a send 
blocks only if the channel is full, and the complex one, which might block the 
sender until the message can be received. For the simple model, the definition 
of chan_0_l appears as follows: 

(task chan_0_l 

((entry send) (entry receive) (variable len chan_range 0)) 

((loop (select 

(when (< len chan_size) (accept send) (assign len (+ len 1))) 
(when (> len 0) (accept receive) (assign len (- len 1))))))) 

Now an MPLSEND from process 0 to 1 becomes simply (call chan_0_l send) , 
while the corresponding receive is (call chan_0_l receive) . The send-receive 
statement is modeled using the select construct as before. 

For the complex channel model, we modified the channel task by requiring 
the sender to make two calls: the first to an entry send as before, the second 
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to an entry complete. The latter has the possibility of blocking until the chan- 
nel becomes empty, i.e., until the message has been received. The definition of 
MPLSENDRECV must also be modified as follows: 

(select 

(when t (call chan_0_6 send) 

(select 

(when t (call chan_0_6 complete) (call chan_3_0 receive)) 
(when t (call chan_3_0 receive) (call chan_0_6 complete)))) 

(when t (call chan_3_0 receive) 

(call chan_0_6 send) (call chan_0_6 complete))) 

In our first attempts applying this approach to the lockstep properties, we 
obtained many spurious solutions with disconnected cycles in the channel tasks. 
However, we were able to take advantage of the special structure of the channel 
tasks to create a very efficient form of the cycle-elimination procedure described 
in [15], and this provided enough precision for a conclusive result in all cases. 

Using the simple communication model with chan_size = 1, we were able to 
verify Lockstep(0, 1; 2) and find the counterexamples to Lockstep(0, 2; 2), each 
in configurations up to size 12 x 12. The times for the 12 x 12 grid were 2 and 
43 minutes, respectively. We were even able to use INCA to find an execution 
prefix, for a 12 x 12 grid, in which two processes become 6 time steps apart. 



5 Theoretical Results 

Our experience analyzing Diffusion2d led us to begin a more general investiga- 
tion of the properties of MPI programs. Here we give a brief summary of our 
inquiry; the details and proofs appear in [16]. To simplify matters, we focused 
on programs that use only the subset of MPI that occurs in Diffusion2d. While 
these functions represent only a small subset of the MPI library, they are among 
the most fundamental and commonly used MPI functions, and many interesting 
and complex parallel programs can be written using only them. Furthermore, 
we expect that the techniques we have developed to deal with this subset can 
be extended to a much larger portion of MPI, including the collective functions 
such as MPLBROADCAST and MPLGATHER. 

In order to reason about such programs, we defined a precise notion of a 
model A4 of an MPI program. In essence A4 consists of an automaton for each 
process, and a set of channels (each with a fixed sending and receiving process). 
The transitions may be labeled by either local events, or by communication 
events. The latter have the form da and c?a, where a is a constant. Each state is 
either a terminal state (a state with no outgoing transitions, representing process 
termination), a local-event state (all transitions departing from that state are 
local), a sending state (there is only one departing transition and it is labeled 
by a send event), a receiving state (all the departing transitions are labeled 
by receive events), or a send-receive state — a state from which first a send can 
happen and then a receive, or first a receive then the send. 
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An execution prefix of Ad is a sequence of transitions from the various au- 
tomata such that (i) the projection onto each automaton is a path starting from 
the initial state, and (ii) for each channel, the sequence of sends and receives 
on the channel obey FIFO ordering. A synchronous prefix is one in which every 
send is immediately followed by its matching receive. All of the theorems here 
also require that At have no wildcard receives — this means that for any state 
s, there exists a channel c such that every receive transition departing from s 
has a label of the form c?a for some a. This corresponds to a program that uses 
neither MPI_ANY_TAG nor MPI_ANY_SOURCE. 

An execution prefix results in a potential deadlock if (i) at least one process 
is not at a terminal state, (ii) no process is at a local state, and (iii) if a process p 
is at a receiving or send-receive state, then for all channels c for which there is a 
receive transition leaving that state, there are no pending messages on c and no 
process is at a state from which it can execute a send on c. We say “potential” 
because it is not necessarily the case that a program that has followed such a 
path will deadlock. It is only a possibility — whether or not an actual deadlock 
occurs depends on the buffering choices made by the MPI implementation at 
the point just after the end of this prefix: if the implementation decides to 
force all sends to synchronize at this point, the program will deadlock; if it 
decides to buffer one or more sends, the program may not deadlock. Hence the 
potentially deadlocked prefixes are precisely the ones for which some choice by 
a legal MPI implementation would lead to deadlock. Since this is precisely the 
kind of behavior we wish to avoid, we say that At is deadlock-free if it has no 
execution prefix of this form. We say that it is synchronously deadlock-free if it 
has no synchronous execution prefix of this form. We can prove the following: 

Theorem 1. Let At be a model of an MPI program with no wildcard receives. 
Then At is deadlock-free if, and only if, At is synchronously deadlock-free. 

The consequence is that, for such a model, it suffices to check deadlock- free with 

chan_size = 0. 

The next theorem concerns the question of barriers. It states that if a model is 
deadlock-free, it must remain deadlock-free after removing all barrier statements. 
Barriers can be represented in our formalism by adding a coordinator process as 
we did with our Spin and INCA models. If A! is a model and B is an appropriate 
set of states from the automata in At, we let At B denote the model with the 
new barrier process and with barriers added just after the states in B. We have 

Theorem 2. Assume At has no wildcard receives. If At B is deadlock-free then 
At is deadlock-free. 

We say that At is locally deterministic if it has no wildcard receives and 
every local-event state has exactly one outgoing transition. 

Theorem 3. Suppose At is a locally deterministic model of an MPI program. 
Then there exists an execution prefix S for At with the following property: if T 
is any execution prefix of At, then for all processes p, the projection of T onto 
p is a subsequence of the projection of S onto p, up to possible reordering of the 
send and receive parts of send-receive statements. 
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Corollary 1. Suppose A 4 is a locally deterministic model of an MPI program. 
Then M is deadlock-free if and only if there exists a synchronous execution 
prefix T such that either T is infinite or T is finite and ends with each process 
in a terminal state. 

Now suppose we are given values of the parameters for Diffusion2d, and 
also a choice of platform for which the numeric operations are deterministic 
functions. Then the “full-precision” model A4 of the program is locally deter- 
ministic. Therefore Theorem 3 provides an affirmative answer to Q2. As for Ql, 
Theorem 2 shows that if Diffusion2d was deadlock-free with the barriers, it will 
be deadlock-free after the barriers are removed; it must necessarily result in the 
same computation by Theorem 3. Hence the barriers really are unnecessary for 
the correctness of the program. Moreover, to check that Diffusion2d (with the 
given parameters) is deadlock-free, Corollary 1 shows that it suffices to check 
a single execution of A4 and observe that it terminates normally. So verifying 
DeadlockFree does not really require any model checking at all (though the 
lockstep properties are a different matter). 

6 Related Work 

Finite-state verification techniques have been applied to various message-passing 
systems almost from the beginning and Spin, of course, provides built-in sup- 
port for a number of message-passing features. Various models and logics for 
describing message-passing systems (e.g., [12,1]) are an active area of research. 
But only a few investigators have looked specifically at the MPI communica- 
tion mechanisms. Georgelin et al. [5] have described some of the MPI primitives 
in LOTOS, and have used simulation and some model checking to study the 
LOTOS descriptions. Matlin et al. [10] used Spin to verify part of the MPI 
infrastructure. 

Our theorems about MPI programs depend on results about the equivalence 
of different interleavings of events in the execution of a concurrent program. Our 
results are related to the large literature on reduction and atomicity (e.g., [8,2,4]) 
and traces [11]. Most of the work on reduction and atomicity has been concerned 
with reducing sequences of statements in a single process, although Cohen and 
Lamport [2] consider statements from different processes and describe, for ex- 
ample, a producer and consumer connected by a FIFO channel in which their 
results although them to assume that messages are consumed as soon as they 
are produced. We do not yet fully understand, however, the extent to which our 
results in the MPI setting correspond to their results for TLA. 

Manolrar and Martin [9] introduce a notion of slack elasticity for a variant of 
CSP. Essentially, a system is slack elastic if increasing the number of messages 
that can be buffered in its communication channels does not change the behavior 
of the system. Their goal is to obtain information about pipelining for hardware 
design and the nondeterminism and communication constructs in their formalism 
are somewhat different from ours. The theorems they prove, however, are similar 
in many respects to ones we describe for MPI programs. 
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7 Conclusions 

MPI-based software for scientific computation represents an important and grow- 
ing domain of concurrent software, with all the standard problems introduced 
by concurrency. Although FSV techniques can offer solutions to many of these 
problems, various aspects of the MPI framework, such as dynamic changes in 
message buffer size, and the scale of typical MPI programs present substantial 
obstacles to the successful application of FSV methods. In this paper, we have 
applied FSV techniques to a small, but realistic, example of a parallel scientific 
program using MPI in order to check some properties of interest to the program- 
mer. We have shown how to model some of the problematic MPI communication 
constructs, and we have described some theoretical results that simplify the ver- 
ification process. 

For small configurations of the MPI program, both Spin and INCA could 
verify the two properties that hold and found counterexamples for the one that 
did not. The programmer was surprised that GlobalLockstep was violated, but 
the violation can occur only when at least one dimension of the grid is greater 
than three and messages are buffered, making the system big enough to be hard 
to reason about informally (although certainly tiny by MPI standards). We do 
not attach much significance to the fact that INCA could do larger configurations 
than Spin. Although we regard ourselves as reasonably skilled Spin users, we 
are certainly more expert in applying INCA. It may well be that there are more 
efficient ways to model the MPI constructs in Promela, or that different settings 
would significantly improve the performance of Spin. In any case, it would be 
foolish to generalize very far on the basis of analysis of a single program. 

Although MPI programs like the examples described here exhibit significant 
symmetry, we do not expect substantial gains from applying standard symmetry 
methods in model checking. The problem is that these methods cannot reduce 
the state space by a factor of more than the order of the symmetry group, and 
the growth of the symmetry group does not keep pace with the growth of the 
state space as the systems are scaled up. Compositional methods, on the other 
hand, might very well allow for collapsing the system to some minimal config- 
uration, depending on the property being checked, and we hope to investigate 
this approach in the future. 

Our theoretical results help simplify the verification and suggest that other 
results tailored to the MPI domain may increase the range of MPI programs 
to which FSV techniques may be effectively applied. We plan to extend our 
investigation to the more general cases of programs making use of wildcard 
receives, non-blocking sends and receives, and the MPI collective operations. It 
will also be important to find appropriate abstractions for programs where the 
values of floating point data affect the flow of control. 

We thank Andrew Siegel for supplying the Diffusion2d example and for many 
discussions of the problems of developing MPI-based scientific software. We are 
also grateful to Ewing Lusk for clarifying some of the more complicated parts 
of the MPI Standard and for encouraging this work. This research was partially 
supported by the U. S. Army Research Laboratory and the U. S. Army Research 
Office under agreement number DAAD 190 110564. 
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Abstract. Spin [9] is a model checker for the verification of distributed 
systems software. The tool is freely distributed, and often described as 
one of the most widely used verification systems. The Advanced Spin Tu- 
torial is a sequel to [7] and is targeted towards intermediate to advanced 
Spin users. 



1 Introduction 

Spin [2, 3, 4, 5, 9] supports the formal verification of distributed systems code. The 
software was developed at Bell Labs in the formal methods and verification group 
starting in 1980. Spin is freely distributed, and often described as one of the most 
widely used verification systems. It is estimated that between 5,000 and 10,000 
people routinely use Spin. Spin was awarded the ACM Software System Award 
for 2001 [1]. 

The automata-theoretic foundation for Spin is laid by [10]. The very recent 
[5] describes Spin 4.0, the latest version of the tool. 

The Spin software is written in standard ANSI C, and is portable across 
all versions of the UNIX operating system, including Mac OS X. It can also be 
compiled to run on any standard PC running Linux or Microsoft Windows. 

2 Tutorial 

The Advanced Spin Tutorial is a sequel to [7] and is targeted towards interme- 
diate to advanced Spin users. The objective of the Advanced Spin Tutorial is 
to (further) educate the Spin 2004 attendees on model checking technology in 
general and Spin in particular. 

The tutorial starts with a brief overview of the latest additions to Promela, 
the specification language of Spin. General patterns are discussed to contruct 
efficient Promela models and how to use Spin in the most effective way [6]. 
Topics to be discussed include: Spin’s optimisation algorithms, directives and 
options to tune verification runs with Spin and guidelines for effective Promela 
modelling, e.g. invariance, atomicity, modelling time, lossy channels, fairness, 
optimisation problems [8]. 
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The second part of the tutorial looks in more detail at the theoretical un- 
derpinnings of Spin, and discusses some of its more recent applications to the 
verification of implementation level systems code, using model extraction tech- 
niques. Also basic and more advanced abstraction techniques for building Spin 
models will be presented, and some examples of large applications of Spin based 
logic model checking. Topics to be discussed include: automata theoretic verifi- 
cation, model construction, abstraction and extraction, and application studies. 

After the tutorial, attendees should: 

— be able to construct (more) efficient and effective Promela models; 

— be able to formulate effective properties that can be checked with Spin; 

— have a basic understanding of the theory and algorithms that make Spin 
work efficiently; 

— have a good understanding of the importance of abstraction in model con- 
struction; 

— understand how and when verification models can be extracted from imple- 
mentation level source code. 
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1 Introduction 

IF [3,7] is an open validation platform for asynchronous timed systems developed 
at Verimag during the last 5 years. 

The toolbox is built upon a specification language based on communicating 
extended timed automata supporting various communication primitives and dy- 
namic process creation and destruction. This language is expressive enough to 
represent most useful concepts of modeling and programming languages for dis- 
tributed systems (like SDL, UML, Java, ...) 

The core of the toolbox consists of a set of model-based validation components 
including exhaustive/interactive simulation, on-the-fly temporal logic model- 
checking, test case generation and optimal path extraction. In order to control 
state explosion, the toolbox provides several static analysis tools operating at 
the source level such as live variable analysis, dead-code elimination and slicing. 
Finally, the toolbox is connected to commercial environments (such as Rational 
Rose, Rhapsody, Objecteering, Object Geode) and may be used for validating 
SDL and UML specifications [1,6]. 

The toolbox has been successfully applied on several case studies including 
telecommunication protocols, distributed algorithms, real-time controllers, man- 
ufacturing, asynchronous circuits [2,5,4]. 

2 Objectives 

The objectives of this tutorial are first, to give a complete presentation of the 
main functionalities of the IF validation environment, and second, to show how 
this environment can be used to experiment on new model-checking techniques. 

Expected attendees are people interested in model-checking techniques, either 
from an (experienced) user or from a tool designer or researcher point of view. 

3 Summary of Material 

In this tutorial, we will guide participants trough the concepts and the use of 
the IF language and the associated tools. More precisely, we will focus on the 
following items: 
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Language: In the first part we will provide a survey of the main concepts of 
the IF language. We will focus on both functional features (structure, com- 
munication, dynamic creation, external code integration) and non- functional 
ones (real-time primitives, resource management, priorities). Moreover, we 
will show how to express properties on IF specifications by means of dedi- 
cated observers. 

Core tools: In this second part we will introduce the toolbox architecture and 
its main components. We will describe the two main APIs: the syntax level 
API (abstract syntax tree) and the semantic level API (state graph). Among 
the tools, we will focus on the static analyser and some of the model based 
tools (e.g, model checker, test generator, optimal path extractor). 

Front-ends and applications: Finally, the third part will be dedicated to existing 
front-ends to SDL and UML. It will also give an overview of the most relevant 
case studies handled with the IF toolbox. 

The tutorial will be illustrated with examples, on-line demos and comparisons 
with other related tool environments (Spin, CADP, Kronos, Uppaal, etc). Partic- 
ipants will receive CDs with the latest version of the IF toolbox and an example 
repository including the examples used in the tutorial. 
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